Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109056
Title: Tracing lexical semantic change with distributional semantics : detection, evaluation, and interpretation
Authors: Chen, Jing
Degree: Ph.D.
Issue Date: 2024
Abstract: The language we live by is in constant change, and the evolving process is manifested in the ways we use it differently over time. With the growing availability of digitalized historical textual data and the increasing powerful language models, recent studies have demonstrated the potential to leverage these models to(semi)automatically identify words that have undergone semantic shifts, particularly in Indo-European languages. This dissertation aims to explore the efficacy of embedding-based methods in interpreting semantic change within Chinese data.
I embark on this journey by validating the current computational fashion with Chi­nese data on a popular lexical semantic change detection task, namely Graded Change Detection, in experiments constrained to periods before and after the sociopolitical back­drop of the Reform and Opening Up in modern Chinese. A significant contribution of this work is the creation of the first shared benchmark for Chinese semantic change, Chi-WUG, which includes over 61,000 human judgments on 1,600 sentence pairs targeting 40 different words. A systematic evaluation of various models in experiments — includ­ing count-based, static, and contextualized models — highlights the performance of the contextualized ones, especially the XL-LEXEME model, which correlates significantly with human judgments (best scores exceeding 0.800). Notably, SGNS-based models demonstrate strong robustness, maintaining consistent performance under varied train­ing conditions. Beyond the scope of the initial experiments, I glimpse the interpretative power of embedding-based methods for uncovering broader linguistic trends. Building on the findings from preferred models, the study expanded the static two-period compar­ison to a dynamic longitudinal analysis, allowing for consistent examination exemplified by a semantically shifted word. Moreover, by expanding the analysis from a select group of predefined words to the entire lexicon, it became possible to observe more subtle and less-discussed semantic shifts
The affirmative validation of embedding-based methods has greatly piqued my inter­est in exploring more complex cases of semantic change that occurred during the period, particularly those interacting with other linguistic processes. This analysis specifically examines semantic shifts that have occurred in word-formation patterns through the two synonymous constructions ‘X-zu’ and ‘X-tuan’, both denoting a group of people, to analyze how their constructional meanings have evolved alongside their increasing mor­phological productivity. By obtaining temporal representations for each attestation, this chapter examines the semantic distributions defined by attested types in semantic space across different periods. It reveals how ‘X-zu’ has broadened its semantic scope to encompass not only ethnic groups but also individuals with shared interests. In contrast, X-tuan displays a less expansive semantic shift. This difference is further illuminated by statistically examining the numbers and density of clusters over time, coupled with an analysis of non-linear development trends.
This dissertation not only addresses key questions regarding the role of word em­beddings in interpreting semantic change within the Chinese language but also makes substantial contributions to the field. It introduces a high-quality benchmark that fa­cilitates experimentation with Chinese data and conducts the first comprehensive and systematic evaluation of various models. These contributions establish a foundational baseline for future research and highlight the potential of computational approaches to address traditional topics in Chinese linguistics. Moreover, this project enriches the­oretical discussions on the interplay between morphological productivity and semantic change. The findings from this research underscore the need for more sophisticated statistical models and the integration of social dimensions in future explorations.
Subjects: Linguistic change
Historical linguistics
Computational linguistics
Hong Kong Polytechnic University -- Dissertations
Pages: xxii, 158 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

116
Citations as of Nov 10, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.