Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/25326
Title: Improving Xtract for Chinese collocation extraction
Authors: Lu, Q 
Li, Y
Xu, RF
Keywords: Computational linguistics
Natural languages
Statistical analysis
Issue Date: 2003
Publisher: IEEE
Source: 2003 International Conference on Natural Language Processing and Knowledge Engineering, 2003 : proceedings : 26-29 October 2003, Beijing, China, p. 333-338 How to cite?
Abstract: We present a system which extracts word-based bigram and n-gram collocation information from a 60MB corpus and then locates bigram pairs using strength and spread as defined in the Xtract system. In order for Xtract to work effectively with Chinese, we have readjusted the parameters. To obtain a higher recall rate, we have modified the algorithm to identify collocations with low-frequency of occurrence, a method which works particularly well in the case of bigrams in which one word is high-frequency and the other low-frequency. In preliminary experiments, our system extracts bigram collocations with a precision of 61%, an 8% improvement over the direct use Smadja' Xtract on Chinese. Further, we have improved the recall rate by 4.5% while extracting multiword collocations with 92% precision.
URI: http://hdl.handle.net/10397/25326
ISBN: 0-7803-7902-0
DOI: 10.1109/NLPKE.2003.1275925
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

4
Citations as of Feb 12, 2016

Page view(s)

22
Last Week
0
Last month
Checked on Jun 18, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.