Back to results list
Please use this identifier to cite or link to this item:
|Title:||A hybrid approach for Chinese coreference resolution||Authors:||Wang, Chi-shing||Keywords:||Hong Kong Polytechnic University -- Dissertations
Natural language processing (Computer science)
Chinese language -- Data processing
|Issue Date:||2007||Publisher:||The Hong Kong Polytechnic University||Abstract:||Coreference resolution is the process of determining the entity that noun phrases refer to. A great deal of research has been done on this task in English, using approaches ranging from linguistics-based ones to machine learning-based. In English, these approaches achieve a respectable performance of about 80% when using state-of-the-art algorithms. In Chinese, however, where there has been much less work done, the performance is only 70%. In my thesis, I will address this performance gap and investigate automatic methods for Chinese coreference resolution that make efficient use of resources. I will propose a hybrid approach to this task that can accurately and automatically identify and resolve coreference for noun phrases in unannotated text. Coreference resolution is mainly composed of two tasks, detection and resolution. The goal of detection is to find all possibly coreferring noun phrases using a linguistics-based approach that contains a set of heuristic rules combining information from part-of-speech tagging and full parsing. Resolution groups noun phrases that refer to the same entity by using a machine learning approach that mixes modified k-means clustering and transformation-based learning. The main algorithm is deliberately chosen to maximize available resources; even the features are generated from Internet sources that are free and easily obtainable. With careful selection of suitable features, I will demonstrate in my thesis the trade-off between the efficiency of using fewer features and the performance to be obtained from using more. I will show my results on two Chinese data sets - TDT3 and ACE05. The ACE value coreference resolution results achieved through my approach are 52.5% and 56.6% respectively. An oracle experiment using gold standard noun phrases achieves even more impressive results of 77.0% and 76.4%. I will analyze the results and show that in order for Chinese noun phrase coreference resolution to achieve results competitive with that of English, accurate segmentation, noun phrases and feature identification are currently the parts that most need attention.||Description:||111 leaves ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577M COMP 2007 WangC
|URI:||http://hdl.handle.net/10397/3440||Rights:||All rights reserved.|
|Appears in Collections:||Thesis|
Show full item record
Files in This Item:
|b21459381_link.htm||For PolyU Users||162 B||HTML||View/Open|
|b21459381_ir.pdf||For All Users (Non-printable)||4.97 MB||Adobe PDF||View/Open|
Citations as of Oct 15, 2018
Citations as of Oct 15, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.