A hybrid evolutionary preprocessing method for imbalanced datasets

Wong, GY; Leung, FHF; Ling, SH

doi:10.1016/j.ins.2018.04.068

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/106952

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.creator	Wong, GY	-
dc.creator	Leung, FHF	-
dc.creator	Ling, SH	-
dc.date.accessioned	2024-06-07T00:59:06Z	-
dc.date.available	2024-06-07T00:59:06Z	-
dc.identifier.issn	0020-0255	-
dc.identifier.uri	http://hdl.handle.net/10397/106952	-
dc.language.iso	en	en_US
dc.publisher	Elsevier Inc.	en_US
dc.rights	© 2018 Published by Elsevier Inc.	en_US
dc.rights	©2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/	en_US
dc.rights	The following publication Wong, G. Y., Leung, F. H., & Ling, S. H. (2018). A hybrid evolutionary preprocessing method for imbalanced datasets. Information Sciences, 454, 161-177 is available at https://doi.org/10.1016/j.ins.2018.04.068.	en_US
dc.title	A hybrid evolutionary preprocessing method for imbalanced datasets	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	161	-
dc.identifier.epage	177	-
dc.identifier.volume	454-455	-
dc.identifier.doi	10.1016/j.ins.2018.04.068	-
dcterms.abstract	Imbalanced datasets are commonly encountered in real-world classification problems. Many machine learning algorithms are originally designed for well-balanced datasets, therefore re-sampling has become an important step to pre-process imbalanced data. This aims to balance the datasets by increasing the samples of the smaller class or decreasing the samples of the larger class, which are known as over-sampling and under-sampling, respectively. In this paper, a sampling strategy that is based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created based on fuzzy logic. Improvement of the datasets is done by the evolutionary computational method of Cross-generational elitist selection, Heterogeneous recombination and Cataclysmic mutation (CHC) that under-samples both the minority and majority samples. Consequently, a hybrid preprocessing method is proposed to re-sample imbalanced datasets. The evaluation is done by applying the Support Vector Machine (SVM), C4.5 decision tree and nearest neighbor rule to train a classification model from the re-sampled training sets. From the experimental results, it can be seen that our proposed method im- proves both the F −measure and AUC. The over-sampling rate and complexity of the classification model are also compared. Our proposed method is found to be superior to all other methods under comparison and it is more robust in different classifiers.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Information sciences, July 2018, v. 454-455, p. 161-177	-
dcterms.isPartOf	Information sciences	-
dcterms.issued	2018-07	-
dc.identifier.scopus	2-s2.0-85046463531	-
dc.identifier.eissn	1872-6291	-
dc.description.validate	202405 bcch	-
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	EIE-0515	en_US
dc.description.fundingSource	RGC	en_US
dc.description.pubStatus	Published	en_US
dc.identifier.OPUS	6837757	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Leung_Hybrid_Evolutionary_Preprocessing.pdf	Pre-Published version	1.42 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

109

Last Week
4

Last month

Citations as of Apr 12, 2026

Downloads

122

Citations as of Apr 12, 2026

SCOPUS^TM
Citations

57

Citations as of May 8, 2026

WEB OF SCIENCE^TM
Citations

45

Citations as of Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM