Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/10256
Title: A maximum-entropy Chinese parser augmented by transformation-based learning
Authors: Fung, P
Ngai, G 
Yang, Y
Chen, B
Keywords: Parsing for Chinese
Maximum entropy
Transformation-based learning
POS tagging
Chunking and parsing for Chinese
Issue Date: 2004
Publisher: ACM
Source: ACM Transactions on Asian language information processing, 2004, v. 3, no. 2, p. 159-168 How to cite?
Journal: ACM Transactions on Asian language information processing 
Abstract: Parsing, the task of identifying syntactic components, e.g., noun and verb phrases, in a sentence, is one of the fundamental tasks in natural language processing. Many natural language applications such as spoken-language understanding, machine translation, and information extraction, would benefit from, or even require, high accuracy parsing as a preprocessing step. Even though most state-of-the-art statistical parsers were initially constructed for parsing in English, most of them are not language-specific, in that they do not rely on properties of the language that are specific to English. Therefore, construction of a parser in a given language becomes a matter of retraining the statistical parameters with a Treebank in the corresponding language. The development of the Chinese treebank [Xia et al. 2000] spurred the construction of parsers for Chinese. However, Chinese as a language poses some unique problems for the development of a statistical parser, the most apparent being word segmentation. Since words in written Chinese are not delimited in the same way as in Western languages, the first problem that needs to be solved before an existing statistical method can be applied to Chinese is to identify the word boundaries. This is a step that is neglected by most pre-existing Chinese parsers, which assume that the input data has already been pre-segmented. This article describes a character-based statistical parser, which gives the best performance to-date on the Chinese treebank data. We augment an existing maximum entropy parser with transformation-based learning, creating a parser that can operate at the character level. We present experiments that show that our parser achieves results that are close to those achievable under perfect word segmentation conditions.
URI: http://hdl.handle.net/10397/10256
ISSN: 1530-0226
DOI: 10.1145/1034780.1034786
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

21
Last Week
0
Last month
1
Citations as of Oct 8, 2017

Page view(s)

43
Last Week
2
Last month
Checked on Oct 15, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.