Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/34721
Title: Five new feature selection metrics in text categorization
Authors: Song, F
Zhang, D 
Xu, Y
Wang, J
Keywords: Feature selection
Text categorization
Support vector machines
Multiple comparative test
Pattern recognition
Issue Date: 2007
Publisher: World Scientific Publishing
Source: International journal on pattern recognition and artificial intelligence, 2007, v. 21, no. 6, p. 1085-1101 How to cite?
Journal: International journal on pattern recognition and artificial intelligence
Abstract: Feature selection has been extensively applied in statistical pattern recognition as a mechanism for cleaning up the set of features that are used to represent data and as a way of improving the performance of classifiers. Four schemes commonly used for feature selection are Exponential Searches, Stochastic Searches, Sequential Searches, and Best Individual Features. The most popular scheme used in text categorization is Best Individual Features as the extremely high dimensionality of text feature spaces render the other three feature selection schemes time prohibitive.
This paper proposes five new metrics for selecting Best Individual Features for use in text categorization. Their effectiveness have been empirically tested on two well- known data collections, Reuters-21578 and 20 Newsgroups. Experimental results show that the performance of two of the five new metrics, Bayesian Rule and F-one Value, is not significantly below that of a good traditional text categorization selection metric, Document Frequency. The performance of another two of these five new metrics, Low Loss Dimensionality Reduction and Relative Frequency Difference, is equal to or better than that of conventional good feature selection metrics such as Mutual Information and Chi-square Statistic.
This paper proposes five new metrics for selecting Best Individual Features for use in text categorization. Their effectiveness have been empirically tested on two well- known data collections, Reuters-21578 and 20 Newsgroups. Experimental results show that the performance of two of the five new metrics, Bayesian Rule and F-one Value, is not significantly below that of a good traditional text categorization selection metric, Document Frequency. The performance of another two of these five new metrics, Low Loss Dimensionality Reduction and Relative Frequency Difference, is equal to or better than that of conventional good feature selection metrics such as Mutual Information and Chi-square Statistic.
URI: http://hdl.handle.net/10397/34721
ISSN: 0218-0014 (print)
1793-6381 (online)
DOI: 10.1142/S0218001407005831
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

WEB OF SCIENCETM
Citations

4
Last Week
0
Last month
Citations as of Feb 26, 2017

Page view(s)

17
Last Week
2
Last month
Checked on Feb 19, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.