Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/88880
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Applied Mathematics | en_US |
dc.creator | Fu, Q | en_US |
dc.creator | Zhuang, Y | en_US |
dc.creator | Gu, J | en_US |
dc.creator | Zhu, Y | en_US |
dc.creator | Guo, X | en_US |
dc.date.accessioned | 2021-01-06T07:32:10Z | - |
dc.date.available | 2021-01-06T07:32:10Z | - |
dc.identifier.issn | 2214-5796 | en_US |
dc.identifier.uri | http://hdl.handle.net/10397/88880 | - |
dc.language.iso | en | en_US |
dc.publisher | Elsevier | en_US |
dc.rights | ©2020 Elsevier Inc. All rights reserved. | en_US |
dc.rights | © 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. | en_US |
dc.rights | The following publication Fu, Q., Zhuang, Y., Gu, J., Zhu, Y., & Guo, X. (2021). Agreeing to Disagree: Choosing Among Eight Topic-Modeling Methods. Big Data Research, 23, 100173 is available at https://dx.doi.org/10.1016/j.bdr.2020.100173. | en_US |
dc.subject | Topic modeling | en_US |
dc.subject | Natural language processing | en_US |
dc.subject | Computational social science | en_US |
dc.subject | Optimal number of topics | en_US |
dc.title | Agreeing to disagree : choosing among eight topic-modeling methods | en_US |
dc.type | Journal/Magazine Article | en_US |
dc.identifier.volume | 23 | en_US |
dc.identifier.doi | 10.1016/j.bdr.2020.100173 | en_US |
dcterms.abstract | Topic modeling is a key research area in natural language processing and has inspired innovative studies in a wide array of social-science disciplines. Yet, the use of topic modeling in computational social science has been hampered by two critical issues. First, social scientists tend to focus on a few standard ways of topic modeling. Our understanding of semantic patterns has not been informed by rapid methodological advances in topic modeling. Moreover, a systematic comparison of the performance of different methods in this field is warranted. Second, the choice of the optimal number of topics remains a challenging task. A comparison of topic-modeling techniques has rarely been situated in a social-science context and the choice appears to be arbitrary for most social scientists. Based on about 120,000 Canadian newspaper articles since 1977, we review and compare eight traditional, generative, and neural methods for topic modeling (Latent Semantic Analysis, Principal Component Analysis, Factor Analysis, Non-negative Matrix Factorization, Latent Dirichlet Allocation, Neural Autoregressive Topic Model, Neural Variational Document Model, and Hierarchical Dirichlet Process). Three measures (coherence statistics, held-out likelihood, and graph-based dimensionality selection) are then used to assess the performance of these methods. Findings are presented and discussed to guide the choice of topic-modeling methods, especially in social science research. | en_US |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Big data research, 15 Feb. 2021, v. 23, 100173 | en_US |
dcterms.isPartOf | Big data research | en_US |
dcterms.issued | 2021-02-15 | - |
dc.identifier.eissn | 2214-580X | en_US |
dc.description.validate | 202101 bcrc | en_US |
dc.description.oa | Accepted Manuscript | en_US |
dc.identifier.FolderNumber | a0521-n01 | - |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | Green (AAM) | en_US |
Appears in Collections: | Journal/Magazine Article |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Fu_Agreeing_Disagree_Topic-Modeling.pdf | Pre-Published version | 1.19 MB | Adobe PDF | View/Open |
Page views
196
Last Week
3
3
Last month
Citations as of Oct 13, 2024
Downloads
225
Citations as of Oct 13, 2024
SCOPUSTM
Citations
11
Citations as of Jun 21, 2024
WEB OF SCIENCETM
Citations
11
Citations as of Oct 10, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.