Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/911
Title: A probabilistic approach to natural language disambiguation : semantic role labeling and dialogue act recognition
Authors: Lan, Kwok-cheung Cyrus
Keywords: Natural language processing (Computer science)
Ambiguity -- Data processing
Hong Kong Polytechnic University -- Dissertations
Issue Date: 2007
Publisher: The Hong Kong Polytechnic University
Abstract: Resolving ambiguities has been a central problem in natural language processing. Most disambiguation tasks to date have focused on relatively low level processing such as morphological, lexical, and syntactic analysis. Their considerable success has stimulated research in higher level, but harder, disambiguation tasks. This thesis addresses two disambiguation tasks, one is at semantic level and the other is at pragmatic level. The tasks are referred to as, respectively, semantic role labeling and dialogue act recognition. We address both tasks using a probabilistic framework, which is in the form of conditional distribution p(ambiguity\ expression, context). We estimate the distribution by conditional Maximum Entropy, which allows heterogeneous sources of information to be integrated in a unifed model for disambiguation. Based on the principle of Maximum Entropy, the selected distribution is of the highest entropy, where no unjustifed assumption is made on the training data while keeping easy for feature modeling. Maximum Entropy has been empirically proved useful in various applications, with moderately effective training time. In the semantic role labeling task, we propose a three-phase labeling approach to the problem. The approach combines advantages from previously proposed methods, while addressing their weaknesses. The approach decomposes the problem of recognizing a complex structure into several local decisions, each recognizing a single piece of the structure. The decisions are determined by supervised learning techniques, by training algorithms from data for prediction. Evaluations on public benchmarks show that our recognition performance is competitive with the current best individual system. In the dialogue act recognition task, we target at non-task oriented recognition. We study various types of features, including lexical, syntactic, and discourse, to evaluate the recognition performance. A feature selection method is used for systematically optimizing the feature set. Experimental results show that our system outperforms all the other approaches that use the same public data set. Despite the high micro-average performance achieved in both tasks, the macro-average performance is unsatisfactory. This is due to the class-imbalance problem in the data sets, where the distribution of examples among the classes is highly skewed. We employ two methods to address this problem in each task. One is over-sampling and the other is error-based learning. Experimental results showed that both methods are effective in improving the macro-average performance in most cases.
Description: xii, 117 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577M COMP 2007 Lan
URI: http://hdl.handle.net/10397/911
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b21166948_link.htmFor PolyU Users167 BHTMLView/Open
b21166948_ir.pdfFor All Users (Non-printable)1.72 MBAdobe PDFView/Open
Show full item record

Page view(s)

653
Last Week
2
Last month
Checked on Mar 26, 2017

Download(s)

262
Checked on Mar 26, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.