Making sense : from word distribution to meaning

Santus, Enrico

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/83985

Title:	Making sense : from word distribution to meaning
Authors:	Santus, Enrico
Degree:	Ph.D.
Issue Date:	2016
Abstract:	In order to perform complex tasks, Natural Language Processing (NLP) applications need to rely on knowledge resources, whose main building blocks have been identified in entities and relations (Herger, 2014). Given their affinity to semantic memory in human beings, these resources have often been referred to as models of semantic memory (Jones, Willits, & Dennis, 2015). In the last fifty years, a number of these models have been proposed in the cognitive, linguistic and computational literature (Jones, Willits, & Dennis, 2015). While the first generation models were mostly theoretical and were not designed to be computationally implemented (i.e. classic models), starting from the 1980s, a second generation tried to address the learnability issue by adopting representations of meaning that could be learnt automatically by observing word co-occurrence in natural text (i.e. learning models). Among the second generation models, starting from the 1990s, Distributional Semantic Models (DSMs) gained a lot of attention in the cognitive, linguistic and computational communities because they allow the efficient treatment of word meaning and word similarity (Harris, 1954), showing furthermore consistent behaviors with psycholinguistic findings (Landauer & Dumais (1997); Lenci, (2008)). Even though these models are strong in identifying similarity (and therefore relatedness), they were found to suffer from a major limitation, that is they do not offer any principled way to discriminate semantic relations held by words. In fact, since they define word similarity in distributional terms (i.e. Distributional Hypothesis; Harris (1954)), they put together, under the umbrella of similar words, terms that are related by very different semantic relations, such as synonymy, antonymy, hypernymy and co-hyponymy (Santus, Lenci, Lu, & Huang, 2015a). In this thesis we address this limitation proposing several unsupervised methods for the discrimination of semantic relations in DSMs. These methods (i.e. APSyn, APAnt and SLQS) are linguistically and cognitively motivated (Murphy G. L., 2002; Cruse, 1986) and aim at identifying distributional properties that characterize the studied semantic relations (i.e. respectively, similarity, opposition and hypernymy), so that the DSMs are provided with useful discriminative information. In particular, our measures analyze the properties of the most salient contexts of the target words, under the assumption that these contexts are more informative than the full distribution, which is instead assumed to include noise (Santus, Lenci, Lu, & Huang, 2015a). In order to identify the most salient contexts, for every target we sort them by either the Positive Pointwise Mutual Information (PPMI; Church & Hanks (1989)) or the Positive Local Mutual Information (PLMI; Evert (2005)), and we select the top N ones, which are then used for the extraction of a given distributional property (i.e. intersection, informativeness, etc.). In all our methods, N is a hyperparameter that can be tuned in a range between 50 and 1000. Our measures are carefully described and evaluated, and they are shown to be competitive with the state-of-the-art, sometimes even outperforming the best models in particular settings (including the recently introduced predictive models, generally referred to as word embeddings; see Mikolov, Yih, & Geoffrey (2013)). Their scores, moreover, have been used as features for ROOT9 (Santus, Lenci, Chiu, Lu, & Huang, 2016e), a supervised system that exploits a Random Forest algorithm to classify taxonomical relations (i.e. hypernymy and co-hyponymy versus unrelated words), achieving state-of-the-art performances (Weeds, Clarke, Reffin, Weir, & Keller, 2014). The thesis is organized as follows. The Introduction describes the problem and the reasons behind the adoption of the distributional framework. The first two chapters describe the main models of semantic memory and discuss how computers can learn and manipulate meaning, starting from word distribution in language corpora. Three chapters are then dedicated to the main semantic relations we have dealt with (i.e. similarity, opposition and hypernymy) and the relative unsupervised measures for their discrimination (i.e. APSyn, APAnt and SLQS). The final chapter describes the supervised method ROOT9 for the identification of taxonomical relations. In the Conclusions, we summarize our contribution and we suggest that future work should target i) the systematic study of the hyperparameters (e.g. the impact of N); ii) the merging of the methods for developing a multi-class classification algorithm; and iii) the adaptation of the methods (and/or their principles) to reduced matrices (see Turney & Pantel (2010)) and word embeddings (see Mikolov, Yih, & Geoffrey (2013))
Subjects:	Computational linguistics. Semantics -- Data processing. Hong Kong Polytechnic University -- Dissertations
Pages:	169 pages : illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/8805

Show full item record

Page views

173

Last Week
1

Last month

Citations as of Jun 22, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM