Sleeping lion or sick man? Machine learning approaches to deciphering heterogeneous images of Chinese in North America

Fu, Q; Zhuang, Y; Zhu, Y; Guo, X

doi:10.1080/24694452.2022.2042180

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/95756

Title:	Sleeping lion or sick man? Machine learning approaches to deciphering heterogeneous images of Chinese in North America
Authors:	Fu, Q Zhuang, Y Zhu, Y Guo, X
Issue Date:	2022
Source:	Annals of the American Association of Geographers, 2022, v. 112, no. 7, p. 2045-2063
Abstract:	Based on more than 280,000 newspaper articles published in North America, this study proposes an integrative machine learning framework to explore heterogeneous social sentiments over time. After retrieving and preprocessing articles containing the term “Chinese” from six mainstream newspapers, we identified major discussion topics and assigned articles to their corresponding topics via posterior probabilities estimated by using a novel Bayesian nonparametric model, the hierarchical Dirichlet process. We also employed a groundbreaking deep learning technique, bidirectional encoder representations from transformers, to assign a negative or positive sentiment score to each newspaper article, which was trained on binary-labeled movie reviews from the Internet Movie Database (IMDb). By combining state-of-the-art tools for topic modeling and sentiment analysis, we found an overall lack of consensus on whether sentiments in North America since 1978 were pro- or anti-Chinese. Moreover, the images of Chinese are highly topic specific: (1) sentiments across different topics show distinct trajectories over the period of study; (2) discussion topics explain much more of the variation in sentiments than do the publisher, year of publication, or country of publisher; (3) less positive sentiments appear to be more relevant to material concerns than to ethnic considerations, whereas more positive sentiments are associated with an appreciation of culture; and (4) sentiments on the same or similar topic might exhibit different temporal patterns in the United States and Canada. These new findings not only suggest a multifaceted and dynamic view of social sentiments in a transnational context but also call for a paradigm shift in understanding intertwined sociodiscursive interactions over time. 基于对28万多篇北美报纸报道的分析，本研究提出一个整合的机器学习框架来追踪斑斓驳杂的社会情感变迁。从六大北美主流报纸中获取和整理所有涉华的原始报道后，我们利用一种先进的贝叶斯非参数模型，即分层狄利克雷过程，来确定这些报道中的主要讨论主题并将每篇报道按其后验概率分配到相应的讨论主题。我们接下来通过来自于网络电影数据库中有二分标签的评论数据来训练双向变形编码器这一具有开创性的深度学习工具，并对每篇报纸报道赋予情感得分。在整合了主题模型和情感分析的不同前沿方法之后，我们并没有发现1978年以来的北美报纸涉华报道有明显的正面或负面倾向。进一步来说，涉华印象和其所在的讨论主题密切相关。首先，不同主题下的情感在所研究的时期内呈现了各自特有的发展轨迹；其次，讨论主题对于情感变化的解释作用要远远强于出版机构、发表年限、所在国家等其它所有因素的解释作用；再次，相对不太正面的评价显得与物质方面而非种族方面的考虑有关，而较为正面的评价则与文化方面相关；最后，即便在相同或相近的讨论主题下，美国与加拿大的报纸报道也会呈现出不同的情感变化轨迹。该研究不仅展示了在跨国背景下的多维度社会情感变迁，而且指出学者需要超越已有的研究范式来深入理解随时间推移而相互交织的社会话语互动。
Keywords:	Big data Chinese Deep learning Machine learning North America Sentiment
Publisher:	Routledge, Taylor & Francis Group
Journal:	Annals of the American Association of Geographers
ISSN:	2469-4452
EISSN:	2469-4460
DOI:	10.1080/24694452.2022.2042180
Rights:	© 2022 by American Association of Geographers This is an Accepted Manuscript of an article published by Taylor & Francis in Annals of the American Association of Geographers on 29 Apr 2022 (Published online), available online: http://www.tandfonline.com/10.1080/24694452.2022.2042180
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Fu_Sleeping_Lion_Sick.pdf	Pre-Published version	2.05 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

154

Last Week
4

Last month

Citations as of Apr 14, 2025

Downloads

93

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

2

Citations as of Jan 9, 2026

Google Scholar^TM

Check