Cross-lingual sentiment lexicon learning

Gao, Dehong

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86917

Title:	Cross-lingual sentiment lexicon learning
Authors:	Gao, Dehong
Degree:	Ph.D.
Issue Date:	2014
Abstract:	Sentiment lexicon contains a certain number of known-sentiment words (e.g., "good", "nice" and "bad"). It has been widely recognized that sentiment lexicon plays a fundamental role in sentiment analysis. Relative to the existing sentiment lexicons in English, the available sentiment lexicons in the other languages such as Chinese are far from sufficient. This dissertation focuses on Cross-lingual Sentiment Lexicon Learning (CSLL), whose goal is to make full use of the existing sentiment resources from one (or more) language(s) to automatically learn sentiment lexicon(s) for other language(s). The dissertation work makes a systematic study on CSLL. In bilingual graph based sentiment lexicon learning, a bilingual graph is built with the words in English and in a target language for which we want to generate the sentiment lexicon. A label propagation based approach is proposed to transfer the sentiment information from English to the target language. To the best of our knowledge, the word alignment information derived from the parallel corpus is the first time leveraged to build the inter-language relations in CSLL, which is proved to significantly increase the coverage of the learned sentiment lexicon. In this work, the sentiment polarity of a word is determined by the sentiment information of the connected words in the bilingual graph. In Co-training based bilingual sentiment lexicon learning, we consider not only the sentiment information of the connected words, but also the information about the words themselves (e.g., word definitions). From these two types of information, novel and effective features are explored to deduce the sentiment polarity of a word. With these features, CSLL is considered as word level sentiment classification and the two classifiers are developed based on the co-training framework to predict the sentiment polarities of the words in two languages respectively. In particular, the learning processes of the two classifiers are connected by the word associations derived from the bilingual resources (e.g. bilingual dictionaries). In these two pieces of work, the words with similar semantics are assumed to have similar sentiments. The proposed approaches can thus connect or associate the semantic-similar words in the learning processes. However, the words similar in semantics do not always have the similar sentiments, especially when the words have multiple senses. In multilingual sentiment lexicon learning, we are dedicated to automatically refine the semantic-oriented connections to the sentiment-oriented connections. Incorporating with multilingual (sentiment) resources, a novel label propagation based approach is developed to propagate sentiment information between multiple languages and to automatically update the weights of the connections. The main contribution of this work is that the proposed approach not only performs well in multilingual sentiment lexicon learning, but also provides a new strategy for graph update. Extensive experiments have been conducted in each piece of work and experimental results demonstrate the effectiveness of the approaches proposed. To summarize, as one of the few large-scale studies on CSLL, this dissertation provides complete learning techniques and a deep analysis on the key factors for cross-lingual sentiment lexicon learning.
Subjects:	Computational linguistics Semantics Hong Kong Polytechnic University -- Dissertations
Pages:	xx, 157 pages : illustrations ; 30 cm
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/7759

Show full item record

Page views

177

Last Week
1

Last month

Citations as of Jun 22, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM