Please use this identifier to cite or link to this item:
Title: A hybrid extraction model for Chinese noun/verb synonym bi-gram
Authors: Li, W
Lu, Q 
Keywords: Collocation extraction
Statistical model
Syntactic rules
Semantic relationship
Similarity calculation
Issue Date: 16-Dec-2011
Publisher: Institute for Digital Enhancement of Cognitive Development, Waseda University
Source: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), 16-18 Dec, Nanyang Technological University, Singapore, p. 430-439 How to cite?
Abstract: Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
ISBN: 978-4-905166-02-3
Rights: © 2011 The PACLIC 25 Organizing Committee and PACLIC Steering Committee
Copyright of contributed papers reserved by respective authors
Copyright 2011 by Wanyin Li, Qin Lu
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Li_Hybrid_Extraction_Bi-gram.pdf135.34 kBAdobe PDFView/Open
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

Last Week
Last month
Checked on Oct 16, 2016


Checked on Oct 16, 2016

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.