Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/7813
Title: Hybrid term indexing for weighted boolean and vector space models
Authors: Chow, KCW
Luk, RWP 
Wong, KF
Kwok, KL
Keywords: Chinese Information Retrieval
Indexing
IR Models
Evaluation
Issue Date: 2001
Publisher: World Scientific Publishing Co
Source: International journal of computer processing of languages, 2001, v. 14, no. 2, p. 133-151 How to cite?
Journal: International journal of computer processing of languages 
Abstract: Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others like Japanese and Korean), there are no space to delimit words. Indexing using hybrid terms (i.e. words and bigrams) were able to achieve the best precision amongst homogenous terms at a lower storage cost than indexing with bigrams. However, this was tested with conjunctive queries. Here, we extended the weighted Boolean models using fuzzy and p-norm measures, as well as the vector space model using the cosine measure, for processing hybrid terms. Our evaluation shows that all IR models using hybrid terms achieve better average precision over those using words. Across different recall values, the weighted Boolean model using fuzzy measures with hybrid terms achieve consistently about 8% higher than those using words. The vector space model using the cosine measures with hybrid terms achieved the best improvement in the average recall and precision.
URI: http://hdl.handle.net/10397/7813
ISSN: 1793-8406
DOI: 10.1142/S0219427901000345
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

7
Citations as of Feb 25, 2017

WEB OF SCIENCETM
Citations

6
Last Week
0
Last month
Citations as of Mar 20, 2017

Page view(s)

9
Last Week
0
Last month
Checked on Mar 19, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.