Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/88880
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Applied Mathematicsen_US
dc.creatorFu, Qen_US
dc.creatorZhuang, Yen_US
dc.creatorGu, Jen_US
dc.creatorZhu, Yen_US
dc.creatorGuo, Xen_US
dc.date.accessioned2021-01-06T07:32:10Z-
dc.date.available2021-01-06T07:32:10Z-
dc.identifier.issn2214-5796en_US
dc.identifier.urihttp://hdl.handle.net/10397/88880-
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rights©2020 Elsevier Inc. All rights reserved.en_US
dc.rights© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/.en_US
dc.rightsThe following publication Fu, Q., Zhuang, Y., Gu, J., Zhu, Y., & Guo, X. (2021). Agreeing to Disagree: Choosing Among Eight Topic-Modeling Methods. Big Data Research, 23, 100173 is available at https://dx.doi.org/10.1016/j.bdr.2020.100173.en_US
dc.subjectTopic modelingen_US
dc.subjectNatural language processingen_US
dc.subjectComputational social scienceen_US
dc.subjectOptimal number of topicsen_US
dc.titleAgreeing to disagree : choosing among eight topic-modeling methodsen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume23en_US
dc.identifier.doi10.1016/j.bdr.2020.100173en_US
dcterms.abstractTopic modeling is a key research area in natural language processing and has inspired innovative studies in a wide array of social-science disciplines. Yet, the use of topic modeling in computational social science has been hampered by two critical issues. First, social scientists tend to focus on a few standard ways of topic modeling. Our understanding of semantic patterns has not been informed by rapid methodological advances in topic modeling. Moreover, a systematic comparison of the performance of different methods in this field is warranted. Second, the choice of the optimal number of topics remains a challenging task. A comparison of topic-modeling techniques has rarely been situated in a social-science context and the choice appears to be arbitrary for most social scientists. Based on about 120,000 Canadian newspaper articles since 1977, we review and compare eight traditional, generative, and neural methods for topic modeling (Latent Semantic Analysis, Principal Component Analysis, Factor Analysis, Non-negative Matrix Factorization, Latent Dirichlet Allocation, Neural Autoregressive Topic Model, Neural Variational Document Model, and Hierarchical Dirichlet Process). Three measures (coherence statistics, held-out likelihood, and graph-based dimensionality selection) are then used to assess the performance of these methods. Findings are presented and discussed to guide the choice of topic-modeling methods, especially in social science research.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationBig data research, 15 Feb. 2021, v. 23, 100173en_US
dcterms.isPartOfBig data researchen_US
dcterms.issued2021-02-15-
dc.identifier.eissn2214-580Xen_US
dc.description.validate202101 bcrcen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera0521-n01-
dc.description.pubStatusPublisheden_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Fu_Agreeing_Disagree_Topic-Modeling.pdfPre-Published version1.19 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

157
Last Week
3
Last month
Citations as of Apr 14, 2024

Downloads

124
Citations as of Apr 14, 2024

SCOPUSTM   
Citations

11
Citations as of Apr 12, 2024

WEB OF SCIENCETM
Citations

8
Citations as of Apr 18, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.