Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/26447
Title: An automatic Chinese collocation extraction algorithm based on lexical statistics
Authors: Xu, RF
Lu, Q 
Li, Y
Keywords: Computational linguistics
Natural languages
Issue Date: 2003
Publisher: IEEE
Source: 2003 International Conference on Natural Language Processing and Knowledge Engineering, 2003 : proceedings : 26-29 October 2003, Beijing, China, p. 321-326 How to cite?
Abstract: We present an automatic Chinese collocation extraction system using lexical statistics and syntactical knowledge. This system extracts collocations from manually segmented and tagged Chinese news corpus in three stages. First, the bidirectional bigram statistical measures, including bidirectional strength and spread, and /spl chi//sup 2/ test value, are employed to extract candidate two-word pairs. These candidate word pairs are then used to extract high frequency multiword collocations from their context. In the third stage, precision is further improved by using syntactical knowledge of collocation patterns between content words to eliminate pseudo collocations. In the preliminary experiment on 30 selected headwords, this three-stage system achieves a 73% precision rate, a substantial improvement on the 61% achieved using an algorithm we developed earlier based on an improved version of the Smdja's 53% accurate Xtract system.
URI: http://hdl.handle.net/10397/26447
ISBN: 0-7803-7902-0
DOI: 10.1109/NLPKE.2003.1275923
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

6
Last Week
4
Last month
0
Citations as of Sep 17, 2017

Page view(s)

43
Last Week
2
Last month
Checked on Sep 24, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.