Please use this identifier to cite or link to this item:
                
				
       http://hdl.handle.net/10397/83343
				
				| Title: | A hybrid approach for Chinese coreference resolution | Authors: | Wang, Chi-shing | Degree: | M.Phil. | Issue Date: | 2007 | Abstract: | Coreference resolution is the process of determining the entity that noun phrases refer to. A great deal of research has been done on this task in English, using approaches ranging from linguistics-based ones to machine learning-based. In English, these approaches achieve a respectable performance of about 80% when using state-of-the-art algorithms. In Chinese, however, where there has been much less work done, the performance is only 70%. In my thesis, I will address this performance gap and investigate automatic methods for Chinese coreference resolution that make efficient use of resources. I will propose a hybrid approach to this task that can accurately and automatically identify and resolve coreference for noun phrases in unannotated text. Coreference resolution is mainly composed of two tasks, detection and resolution. The goal of detection is to find all possibly coreferring noun phrases using a linguistics-based approach that contains a set of heuristic rules combining information from part-of-speech tagging and full parsing. Resolution groups noun phrases that refer to the same entity by using a machine learning approach that mixes modified k-means clustering and transformation-based learning. The main algorithm is deliberately chosen to maximize available resources; even the features are generated from Internet sources that are free and easily obtainable. With careful selection of suitable features, I will demonstrate in my thesis the trade-off between the efficiency of using fewer features and the performance to be obtained from using more. I will show my results on two Chinese data sets - TDT3 and ACE05. The ACE value coreference resolution results achieved through my approach are 52.5% and 56.6% respectively. An oracle experiment using gold standard noun phrases achieves even more impressive results of 77.0% and 76.4%. I will analyze the results and show that in order for Chinese noun phrase coreference resolution to achieve results competitive with that of English, accurate segmentation, noun phrases and feature identification are currently the parts that most need attention. | Subjects: | Hong Kong Polytechnic University -- Dissertations. Natural language processing (Computer science) Chinese language -- Data processing. | Pages: | 111 leaves ; 30 cm. | 
| Appears in Collections: | Thesis | 
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/3131
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



