Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/16626
Title: On strategies for imbalanced text classification using SVM : a comparative study
Authors: Sun, A
Lim, EP
Liu, Y
Keywords: Imbalanced text classification
Instance weighting
Resampling
Support Vector Machines
SVM
Issue Date: 2009
Publisher: Elsevier
Source: Decision support systems, 2009, v. 48, no. 1, p. 191-201 How to cite?
Journal: Decision support systems 
Abstract: Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision-Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.
URI: http://hdl.handle.net/10397/16626
ISSN: 0167-9236
EISSN: 1873-5797
DOI: 10.1016/j.dss.2009.07.011
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

88
Last Week
1
Last month
1
Citations as of Aug 14, 2017

WEB OF SCIENCETM
Citations

63
Last Week
2
Last month
1
Citations as of Aug 13, 2017

Page view(s)

36
Last Week
2
Last month
Checked on Aug 13, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.