Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/77461
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Computing | - |
dc.creator | Ye, J | - |
dc.creator | Li, Y | - |
dc.creator | Wu, Z | - |
dc.creator | Wang, JZ | - |
dc.creator | Li, W | - |
dc.creator | Li, J | - |
dc.date.accessioned | 2018-08-28T01:32:30Z | - |
dc.date.available | 2018-08-28T01:32:30Z | - |
dc.identifier.isbn | 9.78195E+12 | - |
dc.identifier.uri | http://hdl.handle.net/10397/77461 | - |
dc.language.iso | en | en_US |
dc.publisher | Association for Computational Linguistics (ACL) | en_US |
dc.rights | © 2017 Association for Computational Linguistics | en_US |
dc.rights | This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). | en_US |
dc.rights | The following publication Ye, J., Li, Y., Wu, Z., Wang, J. Z., Li, W., & Li, J. (2017, July). Determining gains acquired from word embedding quantitatively using discrete distribution clustering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1847-1856) is available at https://doi.org/10.18653/v1/P17-1169 | en_US |
dc.title | Determining gains acquired from word embedding quantitatively using discrete distribution clustering | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 1847 | - |
dc.identifier.epage | 1856 | - |
dc.identifier.volume | 1 | - |
dc.identifier.doi | 10.18653/v1/P17-1169 | - |
dcterms.abstract | Word embeddings have become widely-used in document analysis. While a large number of models for mapping words to vector spaces have been developed, it remains undetermined how much net gain can be achieved over traditional approaches based on bag-of-words. In this paper, we propose a new document clustering approach by combining any word embedding with a state-of-the-art algorithm for clustering empirical distributions. By using the Wasserstein distance between distributions, the word-to-word semantic relationship is taken into account in a principled way. The new clustering method is easy to use and consistently outperforms other methods on a variety of data sets. More importantly, the method provides an effective framework for determining when and how much word embeddings contribute to document analysis. Experimental results with multiple embedding models are reported. | - |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 3 Jul - 4 Aug 2017, v. 1, p. 1847-1856 | - |
dcterms.issued | 2017 | - |
dc.identifier.scopus | 2-s2.0-85040943836 | - |
dc.relation.ispartofbook | ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) | - |
dc.identifier.rosgroupid | 2017000299 | - |
dc.description.ros | 2017-2018 > Academic research: refereed > Refereed conference paper | - |
dc.description.validate | 201808 bcrc | - |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | OA_IR/PIRA | en_US |
dc.description.pubStatus | Published | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ye_Determining_Gains_Acquired.pdf | 1.49 MB | Adobe PDF | View/Open |
Page views
107
Last Week
1
1
Last month
Citations as of Apr 21, 2024
Downloads
36
Citations as of Apr 21, 2024
SCOPUSTM
Citations
11
Last Week
0
0
Last month
Citations as of Apr 19, 2024
WEB OF SCIENCETM
Citations
6
Last Week
0
0
Last month
Citations as of Apr 18, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.