Please use this identifier to cite or link to this item:
Title: Chinese collocation extraction and its application in natural language processing
Authors: Li, Wan-yin Claire
Degree: Ph.D.
Issue Date: 2007
Abstract: The tranditional approaches in collocation extraction mainly use statictical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information. The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm basd on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.
Subjects: Hong Kong Polytechnic University -- Dissertations.
Natural language processing (Computer science)
Chinese language -- Data processing.
Collocation (Linguistics)
Computational linguistics.
Pages: xiii, 172 p. : ill. ; 30 cm.
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of May 28, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.