Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/10040
Title: An active learning framework for semi-supervised document clustering with language modeling
Authors: Huang, R
Lam, W
Keywords: Active learning
Document clustering
Language modeling
Semi-supervised
Issue Date: 2009
Publisher: Elsevier Science Bv
Source: Data and knowledge engineering, 2009, v. 68, no. 1, p. 49-67 How to cite?
Journal: Data and Knowledge Engineering 
Abstract: This paper investigates a framework that actively selects informative document pairs for obtaining user feedback for semi-supervised document clustering. A gain-directed document pair selection method that measures how much we can learn by revealing judgments of selected document pairs is designed. We use the estimation of term co-occurrence probabilities as a clue for finding informative document pairs. Term co-occurrence probabilities are considered in the semi-supervised document clustering process to capture term-to-term dependence relationships. In the semi-supervised document clustering, each cluster is represented by a language model. We have conducted extensive experiments on several real-world corpora. The results demonstrate that our proposed framework is effective.
URI: http://hdl.handle.net/10397/10040
DOI: 10.1016/j.datak.2008.08.008
Appears in Collections:Journal/Magazine Article

SFX Query Show full item record

SCOPUSTM   
Citations

36
Last Week
1
Last month
0
Citations as of Dec 9, 2017

WEB OF SCIENCETM
Citations

29
Last Week
1
Last month
0
Citations as of Dec 18, 2017

Page view(s)

47
Last Week
1
Last month
Citations as of Dec 18, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.