Please use this identifier to cite or link to this item:
Title: A hybrid approach for Chinese coreference resolution
Authors: Wang, Chi-shing
Keywords: Hong Kong Polytechnic University -- Dissertations
Natural language processing (Computer science)
Chinese language -- Data processing
Issue Date: 2007
Publisher: The Hong Kong Polytechnic University
Abstract: Coreference resolution is the process of determining the entity that noun phrases refer to. A great deal of research has been done on this task in English, using approaches ranging from linguistics-based ones to machine learning-based. In English, these approaches achieve a respectable performance of about 80% when using state-of-the-art algorithms. In Chinese, however, where there has been much less work done, the performance is only 70%. In my thesis, I will address this performance gap and investigate automatic methods for Chinese coreference resolution that make efficient use of resources. I will propose a hybrid approach to this task that can accurately and automatically identify and resolve coreference for noun phrases in unannotated text. Coreference resolution is mainly composed of two tasks, detection and resolution. The goal of detection is to find all possibly coreferring noun phrases using a linguistics-based approach that contains a set of heuristic rules combining information from part-of-speech tagging and full parsing. Resolution groups noun phrases that refer to the same entity by using a machine learning approach that mixes modified k-means clustering and transformation-based learning. The main algorithm is deliberately chosen to maximize available resources; even the features are generated from Internet sources that are free and easily obtainable. With careful selection of suitable features, I will demonstrate in my thesis the trade-off between the efficiency of using fewer features and the performance to be obtained from using more. I will show my results on two Chinese data sets - TDT3 and ACE05. The ACE value coreference resolution results achieved through my approach are 52.5% and 56.6% respectively. An oracle experiment using gold standard noun phrases achieves even more impressive results of 77.0% and 76.4%. I will analyze the results and show that in order for Chinese noun phrase coreference resolution to achieve results competitive with that of English, accurate segmentation, noun phrases and feature identification are currently the parts that most need attention.
Description: 111 leaves ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577M COMP 2007 WangC
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b21459381_link.htmFor PolyU Users 162 BHTMLView/Open
b21459381_ir.pdfFor All Users (Non-printable) 4.97 MBAdobe PDFView/Open
Show full item record
PIRA download icon_1.1View/Download Contents

Page view(s)

Last Week
Last month
Citations as of Oct 15, 2018


Citations as of Oct 15, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.