Determining gains acquired from word embedding quantitatively using discrete distribution clustering

Ye, J; Li, Y; Wu, Z; Wang, JZ; Li, W; Li, J

doi:10.18653/v1/P17-1169

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/77461

Title:	Determining gains acquired from word embedding quantitatively using discrete distribution clustering
Authors:	Ye, J Li, Y Wu, Z Wang, JZ Li, W Li, J
Issue Date:	2017
Source:	ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 3 Jul - 4 Aug 2017, v. 1, p. 1847-1856
Abstract:	Word embeddings have become widely-used in document analysis. While a large number of models for mapping words to vector spaces have been developed, it remains undetermined how much net gain can be achieved over traditional approaches based on bag-of-words. In this paper, we propose a new document clustering approach by combining any word embedding with a state-of-the-art algorithm for clustering empirical distributions. By using the Wasserstein distance between distributions, the word-to-word semantic relationship is taken into account in a principled way. The new clustering method is easy to use and consistently outperforms other methods on a variety of data sets. More importantly, the method provides an effective framework for determining when and how much word embeddings contribute to document analysis. Experimental results with multiple embedding models are reported.
Publisher:	Association for Computational Linguistics (ACL)
ISBN:	9.78195E+12
DOI:	10.18653/v1/P17-1169
Rights:	© 2017 Association for Computational Linguistics This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). The following publication Ye, J., Li, Y., Wu, Z., Wang, J. Z., Li, W., & Li, J. (2017, July). Determining gains acquired from word embedding quantitatively using discrete distribution clustering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1847-1856) is available at https://doi.org/10.18653/v1/P17-1169
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Ye_Determining_Gains_Acquired.pdf		1.49 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Page views

234

Last Week
26

Last month

Citations as of Feb 9, 2026

Downloads

100

Citations as of Feb 9, 2026

SCOPUS^TM
Citations

14

Last Week
0

Last month
0

Citations as of May 8, 2026

WEB OF SCIENCE^TM
Citations

6

Last Week
0

Last month
0

Citations as of Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM