Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/34168
Title: Hybrid cluster ensemble framework based on the random combination of data transformation operators
Authors: Yu, Z
Wong, HS
You, J 
Yu, G
Han, G
Keywords: Cluster ensemble
Data mining
Data transformation
Gene expression profile
Issue Date: 2012
Publisher: Elsevier
Source: Pattern recognition, 2012, v. 45, no. 5, p. 1826-1837 How to cite?
Journal: Pattern recognition 
Abstract: Given a dataset P represented by an n×m matrix (where n is the number of data points and m is the number of attributes), we study the effect of applying transformations to P and how this affects the performance of different ensemble algorithms. Specifically, a dataset P can be transformed into a new dataset P′ by a set of transformation operators Φ in the instance dimension, such as sub-sampling, super-sampling, noise injection, and so on, and a corresponding set of transformation operators Ψ in the attribute dimension. Based on these conventional transformation operators Φ and Ψ, a general form Ω of the transformation operator is proposed to represent different kinds of transformation operators. Then, two new data transformation operators, known respectively as probabilistic based data sampling operator and probabilistic based attribute sampling operator, are designed to generate new datasets in the ensemble. Next, three new random transformation operators are proposed, which include the random combination of transformation operators in the data dimension, in the attribute dimension, and in both dimensions respectively. Finally, a new cluster ensemble approach is proposed, which integrates the random combination of data transformation operators across different dimensions, a hybrid clustering technique, a confidence measure, and the normalized cut algorithm into the ensemble framework. The experiments show that (i) random combination of transformation operators across different dimensions outperforms most of the conventional data transformation operators for different kinds of datasets. (ii) The proposed cluster ensemble framework performs well on different datasets such as gene expression datasets and datasets in the UCI machine learning repository.
URI: http://hdl.handle.net/10397/34168
ISSN: 0031-3203
EISSN: 1873-5142
DOI: 10.1016/j.patcog.2011.11.016
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

19
Last Week
2
Last month
0
Citations as of Nov 7, 2017

WEB OF SCIENCETM
Citations

15
Last Week
0
Last month
0
Citations as of Nov 15, 2017

Page view(s)

59
Last Week
1
Last month
Checked on Nov 12, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.