Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/29073
Title: Efficient in-memory extensible inverted file
Authors: Luk, RWP 
Lam, W
Keywords: Indexing
Information retrieval
Optimization
Issue Date: 2007
Publisher: Pergamon-Elsevier Science Ltd
Source: Information systems, 2007, v. 32, no. 5, p. 733-754 How to cite?
Journal: Information Systems 
Abstract: The growing amount of on-line data demands efficient parallel and distributed indexing mechanisms to manage large resource requirements and unpredictable system failures. Parallel and distributed indices built using commodity hardware like personal computers (PCs) can substantially save cost because PCs are produced in bulk, achieving the scale of economy. However, PCs have limited amount of random access memory (RAM) and the effective utilization of RAM for in-memory inversion is crucial. This paper presents an analytical investigation and an empirical evaluation of storage-efficient inmemory extensible inverted files, which are represented by fixed- or variable-sized linked list nodes. The size of these linked list nodes is determined by minimizing the storage wastes or maximizing storage utilization under different conditions, which lead to different storage allocation schemes. Minimizing storage wastes also reduces the number of address indirections (i.e., chaining). We evaluated our storage allocation schemes using a number of reference collections. We found that the arrival rate scheme is the best in terms of both storage utilization and the mean number of chainings per term. The final storage utilization can be over 90% in our evaluation if there is a sufficient number of documents indexed. The mean number of chainings is not large (less than 2.6 for all the reference collections). We have also showed that our best storage allocation scheme can be used for our extensible compressed inverted file. The final storage utilization of the extensible compressed inverted file can be over 90% in our evaluation provided that there is a sufficient number of documents indexed. The proposed storage allocation schemes can also be used by compressed extensible inverted files with word positions.
URI: http://hdl.handle.net/10397/29073
DOI: 10.1016/j.is.2006.06.001
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

10
Last Week
0
Last month
0
Citations as of Aug 15, 2017

WEB OF SCIENCETM
Citations

8
Last Week
0
Last month
0
Citations as of Aug 20, 2017

Page view(s)

32
Last Week
1
Last month
Checked on Aug 20, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.