Agreeing to disagree : choosing among eight topic-modeling methods

Fu, Q; Zhuang, Y; Gu, J; Zhu, Y; Guo, X

doi:10.1016/j.bdr.2020.100173

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/88880

DC Field	Value	Language
dc.contributor	Department of Applied Mathematics	en_US
dc.creator	Fu, Q	en_US
dc.creator	Zhuang, Y	en_US
dc.creator	Gu, J	en_US
dc.creator	Zhu, Y	en_US
dc.creator	Guo, X	en_US
dc.date.accessioned	2021-01-06T07:32:10Z	-
dc.date.available	2021-01-06T07:32:10Z	-
dc.identifier.issn	2214-5796	en_US
dc.identifier.uri	http://hdl.handle.net/10397/88880	-
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.rights	©2020 Elsevier Inc. All rights reserved.	en_US
dc.rights	© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/.	en_US
dc.rights	The following publication Fu, Q., Zhuang, Y., Gu, J., Zhu, Y., & Guo, X. (2021). Agreeing to Disagree: Choosing Among Eight Topic-Modeling Methods. Big Data Research, 23, 100173 is available at https://dx.doi.org/10.1016/j.bdr.2020.100173.	en_US
dc.subject	Topic modeling	en_US
dc.subject	Natural language processing	en_US
dc.subject	Computational social science	en_US
dc.subject	Optimal number of topics	en_US
dc.title	Agreeing to disagree : choosing among eight topic-modeling methods	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	23	en_US
dc.identifier.doi	10.1016/j.bdr.2020.100173	en_US
dcterms.abstract	Topic modeling is a key research area in natural language processing and has inspired innovative studies in a wide array of social-science disciplines. Yet, the use of topic modeling in computational social science has been hampered by two critical issues. First, social scientists tend to focus on a few standard ways of topic modeling. Our understanding of semantic patterns has not been informed by rapid methodological advances in topic modeling. Moreover, a systematic comparison of the performance of different methods in this field is warranted. Second, the choice of the optimal number of topics remains a challenging task. A comparison of topic-modeling techniques has rarely been situated in a social-science context and the choice appears to be arbitrary for most social scientists. Based on about 120,000 Canadian newspaper articles since 1977, we review and compare eight traditional, generative, and neural methods for topic modeling (Latent Semantic Analysis, Principal Component Analysis, Factor Analysis, Non-negative Matrix Factorization, Latent Dirichlet Allocation, Neural Autoregressive Topic Model, Neural Variational Document Model, and Hierarchical Dirichlet Process). Three measures (coherence statistics, held-out likelihood, and graph-based dimensionality selection) are then used to assess the performance of these methods. Findings are presented and discussed to guide the choice of topic-modeling methods, especially in social science research.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Big data research, 15 Feb. 2021, v. 23, 100173	en_US
dcterms.isPartOf	Big data research	en_US
dcterms.issued	2021-02-15	-
dc.identifier.eissn	2214-580X	en_US
dc.description.validate	202101 bcrc	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a0521-n01	-
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Fu_Agreeing_Disagree_Topic-Modeling.pdf	Pre-Published version	1.19 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

222

Last Week
3

Last month

Citations as of Apr 14, 2025

Downloads

288

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

11

Citations as of Jun 21, 2024

WEB OF SCIENCE^TM
Citations

11

Citations as of Oct 10, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM