Please use this identifier to cite or link to this item:
Title: Chinese text chunking using lexicalized HMMs
Authors: Fu, GH
Xu, R
Luke, KK
Lu, Q 
Keywords: Hidden Markov models
Text analysis
Issue Date: 2005
Publisher: IEEE
Source: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005, 18-21 August 2005, Guangzhou, China, v. 1, p. 7-12 How to cite?
Abstract: This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system.
ISBN: 0-7803-9091-1
DOI: 10.1109/ICMLC.2005.1526911
Appears in Collections:Conference Paper

View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

Last Week
Last month
Citations as of Jul 16, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.