Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/31314
Title: Mining pure high-order word associations via information geometry for information retrieval
Authors: Hou, Y
Zhao, X
Song, D
Li, W 
Keywords: Information geometry
Pure high-order dependence
Text classification
Text retrieval
Word association
Issue Date: 2013
Source: ACM transactions on information systems, 2013, v. 31, no. 3 How to cite?
Journal: ACM Transactions on Information Systems 
Abstract: The classical bag-of-word models for information retrieval (IR) fail to capture contextual associations between words. In this article, we propose to investigate pure high-order dependence among a number of words forming an unseparable semantic entity, that is, the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns would lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence-unconditional pure dependence (UPD) and conditional pure dependence (CPD)-are defined. The exact decision on UPD and CPD, however, is NP-hard in general.We hence derive and prove the sufficient criteria that entail UPD and CPD, within the well-principled information geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods for extracting word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification and text retrieval without and with query expansion.
URI: http://hdl.handle.net/10397/31314
ISSN: 1046-8188
DOI: 10.1145/2493175.2493177
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

12
Last Week
0
Last month
1
Citations as of Dec 8, 2018

WEB OF SCIENCETM
Citations

2
Last Week
0
Last month
0
Citations as of Dec 11, 2018

Page view(s)

82
Last Week
0
Last month
Citations as of Dec 10, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.