Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/10439
Title: Imbalanced text classification : a term weighting approach
Authors: Liu, Y
Loh, HT
Sun, A
Keywords: Imbalanced data
Term weighting scheme
Text classification
Issue Date: 2009
Publisher: Pergamon Press
Source: Expert systems with applications, 2009, v. 36, no. 1, p. 690-701 How to cite?
Journal: Expert systems with applications 
Abstract: The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information ratios, i.e. relevance indicators. Such relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study using both Support Vector Machines and Naïve Bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while the performance for major categories are not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed data sets.
URI: http://hdl.handle.net/10397/10439
ISSN: 0957-4174
EISSN: 1873-6793
DOI: 10.1016/j.eswa.2007.10.042
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

96
Last Week
0
Last month
3
Citations as of Aug 11, 2017

WEB OF SCIENCETM
Citations

68
Last Week
0
Last month
5
Citations as of Aug 5, 2017

Page view(s)

31
Last Week
2
Last month
Checked on Aug 13, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.