EmbedX : a versatile, efficient and scalable platform to embed both graphs and high-dimensional sparse data

Zou, Y; Ding, Z; Shi, J; Guo, S; Su, C; Zhang, Y

doi:10.14778/3611540.3611546

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119514

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Zou, Y	-
dc.creator	Ding, Z	-
dc.creator	Shi, J	-
dc.creator	Guo, S	-
dc.creator	Su, C	-
dc.creator	Zhang, Y	-
dc.date.accessioned	2026-06-26T02:02:38Z	-
dc.date.available	2026-06-26T02:02:38Z	-
dc.identifier.issn	2150-8097	-
dc.identifier.uri	http://hdl.handle.net/10397/119514	-
dc.description	The 49th International Conference on Very Large Data Bases, Vancouver, Canada, August 28 to September 1, 2023	en_US
dc.language.iso	en	en_US
dc.publisher	Association for Computing Machinery	en_US
dc.rights	This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment.	en_US
dc.rights	The following publication Zou, Y., Ding, Z., Shi, J., Guo, S., Su, C., & Zhang, Y. (2023). Embedx: A versatile, efficient and scalable platform to embed both graphs and high-dimensional sparse data. Proceedings of the VLDB Endowment, 16(12), 3543-3556 is available at https://doi.org/10.14778/3611540.3611546.	en_US
dc.title	EmbedX : a versatile, efficient and scalable platform to embed both graphs and high-dimensional sparse data	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	3543	-
dc.identifier.epage	3556	-
dc.identifier.volume	16	-
dc.identifier.issue	12	-
dc.identifier.doi	10.14778/3611540.3611546	-
dcterms.abstract	In modern online services, it is of growing importance to process web-scale graph data and high-dimensional sparse data together into embeddings for downstream tasks, such as recommendation, advertisement, prediction, and classification. There exist learning methods and systems for either high-dimensional sparse data or graphs, but not both.	-
dcterms.abstract	There is an urgent need in industry to have a system to efficiently process both types of data for higher business value, which however, is challenging. The data in Tencent contains billions of samples with sparse features in very high dimensions, and graphs are also with billions of nodes and edges. Moreover, learning models often perform expensive operations with high computational costs. It is difficult to store, manage, and retrieve massive sparse data and graph data together, since they exhibit different characteristics.	-
dcterms.abstract	We present EmbedX, an industrial distributed learning framework from Tencent, which is versatile and efficient to support embedding on both graphs and high-dimensional sparse data. EmbedX consists of distributed server layers for graph and sparse data management, and optimized parameter and graph operators, to efficiently support 4 categories of methods, including deep learning models on high-dimensional sparse data, network embedding methods, graph neural networks, and in-house developed joint learning models on both types of data. Extensive experiments on massive Tencent data and public data demonstrate the superiority of EmbedX. For instance, on a Tencent dataset with 1.3 billion nodes, 35 billion edges, and 2.8 billion samples with sparse features in 1.6 billion dimension, EmbedX performs an order of magnitude faster for training and our joint models achieve superior effectiveness. EmbedX is deployed in Tencent. A/B test on real use cases further validates the power of EmbedX. EmbedX is implemented in C++ and open-sourced at https://github.com/Tencent/embedx.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of the VLDB Endowment, Aug. 2023, v. 16, no. 12, p. 3543-3556	-
dcterms.isPartOf	Proceedings of the VLDB Endowment	-
dcterms.issued	2023-08	-
dc.identifier.scopus	2-s2.0-85174549968	-
dc.description.validate	202606 bcjz	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Scopus/WOS	en_US
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work is supported by Hong Kong RGC ECS No. 25201221, and National Natural Science Foundation of China No. 62202404. This work is also supported by a collaboration grant from Tencent Technology (Shenzhen) Co., Ltd (P0039546). This work is supported by a startup fund (P0033898) from Hong Kong Polytechnic University and project P0036831.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
3611540.3611546.pdf		1.16 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM