Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/3038
Title: Chinese collocation extraction and its application in natural language processing
Authors: Li, Wan-yin Claire
Keywords: Hong Kong Polytechnic University -- Dissertations
Natural language processing (Computer science)
Chinese language -- Data processing
Collocation (Linguistics)
Computational linguistics
Issue Date: 2007
Publisher: The Hong Kong Polytechnic University
Abstract: The tranditional approaches in collocation extraction mainly use statictical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information. The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm basd on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.
Description: xiii, 172 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2007 Li
URI: http://hdl.handle.net/10397/3038
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b21459435_link.htmFor PolyU Users 162 BHTMLView/Open
b21459435_ir.pdfFor All Users (Non-printable) 2.31 MBAdobe PDFView/Open
Show full item record

Page view(s)

437
Last Week
2
Last month
Checked on Mar 26, 2017

Download(s)

567
Checked on Mar 26, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.