Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/33896
Title: Hybrid Chinese term indexing and the 2-Poisson model
Authors: Luk, RWP 
Wong, KF
Keywords: 2-Poisson model
Chinese information retrieval
Evaluation
Indexing
Issue Date: 2003
Publisher: IEICE-Inst Electronics Information Communications Eng
Source: IEICE transactions on information and systems, 2003, v. e86-d, no. 9, p. 1745-1752 How to cite?
Journal: IEICE Transactions on Information and Systems 
Abstract: Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i.e. words and bigrams) was found to be effective and efficient using the 2-Poisson model in NTCIR-III open evaluation workshop. Here, we explore another Okapi weight, BM25, based on the 2-Poisson model and compared their performances with bigram and word indexing strategies. Results show that word indexing is the most efficient in terms of indexing time and storage but hybrid term indexing requires the least amount of retrieval time per query. Without pseudo-relevance feedback (PRF), our BM25 appeared to yield better retrieval effectiveness performance for short queries. With PRF, our implementation of the BM11 weights, which are a simplified version of BM25, with hybrid term indexing remains the most effective combination for retrieval in this study.
URI: http://hdl.handle.net/10397/33896
ISSN: 0916-8532
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

2
Last Week
0
Last month
0
Citations as of Nov 16, 2017

WEB OF SCIENCETM
Citations

1
Last Week
0
Last month
0
Citations as of Nov 17, 2017

Page view(s)

47
Last Week
1
Last month
Checked on Nov 12, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.