Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/95756
Title: | Sleeping lion or sick man? Machine learning approaches to deciphering heterogeneous images of Chinese in North America | Authors: | Fu, Q Zhuang, Y Zhu, Y Guo, X |
Issue Date: | 2022 | Source: | Annals of the American Association of Geographers, 2022, v. 112, no. 7, p. 2045-2063 | Abstract: | Based on more than 280,000 newspaper articles published in North America, this study proposes an integrative machine learning framework to explore heterogeneous social sentiments over time. After retrieving and preprocessing articles containing the term “Chinese” from six mainstream newspapers, we identified major discussion topics and assigned articles to their corresponding topics via posterior probabilities estimated by using a novel Bayesian nonparametric model, the hierarchical Dirichlet process. We also employed a groundbreaking deep learning technique, bidirectional encoder representations from transformers, to assign a negative or positive sentiment score to each newspaper article, which was trained on binary-labeled movie reviews from the Internet Movie Database (IMDb). By combining state-of-the-art tools for topic modeling and sentiment analysis, we found an overall lack of consensus on whether sentiments in North America since 1978 were pro- or anti-Chinese. Moreover, the images of Chinese are highly topic specific: (1) sentiments across different topics show distinct trajectories over the period of study; (2) discussion topics explain much more of the variation in sentiments than do the publisher, year of publication, or country of publisher; (3) less positive sentiments appear to be more relevant to material concerns than to ethnic considerations, whereas more positive sentiments are associated with an appreciation of culture; and (4) sentiments on the same or similar topic might exhibit different temporal patterns in the United States and Canada. These new findings not only suggest a multifaceted and dynamic view of social sentiments in a transnational context but also call for a paradigm shift in understanding intertwined sociodiscursive interactions over time. 基于对28万多篇北美报纸报道的分析,本研究提出一个整合的机器学习框架来追踪斑斓驳杂的社会情感变迁。从六大北美主流报纸中获取和整理所有涉华的原始报道后,我们利用一种先进的贝叶斯非参数模型,即分层狄利克雷过程,来确定这些报道中的主要讨论主题并将每篇报道按其后验概率分配到相应的讨论主题。我们接下来通过来自于网络电影数据库中有二分标签的评论数据来训练双向变形编码器这一具有开创性的深度学习工具,并对每篇报纸报道赋予情感得分。在整合了主题模型和情感分析的不同前沿方法之后,我们并没有发现1978年以来的北美报纸涉华报道有明显的正面或负面倾向。进一步来说,涉华印象和其所在的讨论主题密切相关。首先,不同主题下的情感在所研究的时期内呈现了各自特有的发展轨迹;其次,讨论主题对于情感变化的解释作用要远远强于出版机构、发表年限、所在国家等其它所有因素的解释作用;再次,相对不太正面的评价显得与物质方面而非种族方面的考虑有关,而较为正面的评价则与文化方面相关;最后,即便在相同或相近的讨论主题下,美国与加拿大的报纸报道也会呈现出不同的情感变化轨迹。该研究不仅展示了在跨国背景下的多维度社会情感变迁,而且指出学者需要超越已有的研究范式来深入理解随时间推移而相互交织的社会话语互动。 |
Keywords: | Big data Chinese Deep learning Machine learning North America Sentiment |
Publisher: | Routledge, Taylor & Francis Group | Journal: | Annals of the American Association of Geographers | ISSN: | 2469-4452 | EISSN: | 2469-4460 | DOI: | 10.1080/24694452.2022.2042180 | Rights: | © 2022 by American Association of Geographers This is an Accepted Manuscript of an article published by Taylor & Francis in Annals of the American Association of Geographers on 29 Apr 2022 (Published online), available online: http://www.tandfonline.com/10.1080/24694452.2022.2042180 |
Appears in Collections: | Journal/Magazine Article |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Fu_Sleeping_Lion_Sick.pdf | Pre-Published version | 2.05 MB | Adobe PDF | View/Open |
Page views
119
Last Week
4
4
Last month
Citations as of Nov 10, 2024
Downloads
65
Citations as of Nov 10, 2024
SCOPUSTM
Citations
1
Citations as of Nov 14, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.