Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/5202
Title: A hybrid extraction model for Chinese noun/verb synonym bi-gram
Authors: Li, W
Lu, Q 
Keywords: Collocation extraction
Statistical model
Syntactic rules
Semantic relationship
Similarity calculation
HowNet
Issue Date: 16-Dec-2011
Publisher: Institute for Digital Enhancement of Cognitive Development, Waseda University
Source: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), 16-18 Dec, Nanyang Technological University, Singapore, p. 430-439 How to cite?
Abstract: Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
URI: http://hdl.handle.net/10397/5202
ISBN: 978-4-905166-02-3
Rights: © 2011 The PACLIC 25 Organizing Committee and PACLIC Steering Committee
Copyright of contributed papers reserved by respective authors
Copyright 2011 by Wanyin Li, Qin Lu
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Li_Hybrid_Extraction_Bi-gram.pdf135.34 kBAdobe PDFView/Open
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

174
Checked on Feb 7, 2016


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.