Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/20343
Title: Semi-supervised text categorization by considering sufficiency and diversity
Authors: Li, S
Lee, SYM 
Gao, W
Huang, CR 
Keywords: Bootstrapping
Semi-supervised learning
Sentiment classification
Issue Date: 2013
Publisher: Springer Verlag
Source: Communications in computer and information science, 2013, v. 400, p. 105-115 How to cite?
Journal: Communications in Computer and Information Science 
Abstract: In text categorization (TC), labeled data is often limited while unlabeled data is ample. This motivates semi-supervised learning for TC to improve the performance by exploring the knowledge in both labeled and unlabeled data. In this paper, we propose a novel bootstrapping approach to semi-supervised TC. First of all, we give two basic preferences, i.e., sufficiency and diversity for a possibly successful bootstrapping. After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space. Moreover, we further improve the random feature subspace-based bootstrapping with some constraints on the subspace generation to better satisfy the diversity preference. Experimental evaluation shows the effectiveness of our modified bootstrapping approach in both topic and sentiment-based TC tasks.
Description: 2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013, Chongqing, 15-19 November 2013
URI: http://hdl.handle.net/10397/20343
ISBN: 9783642416439
ISSN: 1865-0929
DOI: 10.1007/978-3-642-41644-6_11
Appears in Collections:Conference Paper

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

35
Last Week
1
Last month
Checked on Sep 17, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.