Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/9791
Title: Probabilistic cluster structure ensemble
Authors: Yu, Z
Li, L
Wong, HS
You, J 
Han, G
Gao, Y
Yu, G
Keywords: Cluster ensemble
Gaussian mixture model
Normalized cut
Structure ensemble
Issue Date: 2014
Publisher: Elsevier
Source: Information sciences, 2014, v. 267, p. 16-34 How to cite?
Journal: Information sciences 
Abstract: Cluster structure ensemble focuses on integrating multiple cluster structures extracted from different datasets into a unified cluster structure, instead of aligning the individual labels from the clustering solutions derived from multiple homogenous datasets in the cluster ensemble framework. In this article, we design a novel probabilistic cluster structure ensemble framework, referred to as Gaussian mixture model based cluster structure ensemble framework (GMMSE), to identify the most representative cluster structure from the dataset. Specifically, GMMSE first applies the bagging approach to produce a set of variant datasets. Then, a set of Gaussian mixture models are used to capture the underlying cluster structures of the datasets. GMMSE applies K-means to initialize the values of the parameters of the Gaussian mixture model, and adopts the Expectation Maximization approach (EM) to estimate the parameter values of the model. Next, the components of the Gaussian mixture models are viewed as new data samples which are used to construct the representative matrix capturing the relationships among components. The similarity between two components corresponding to their respective Gaussian distributions is measured by the Bhattycharya distance function. Afterwards, GMMSE constructs a graph based on the new data samples and the representative matrix, and searches for the most representative cluster structure. Finally, we also design four criteria to assign the data samples to their corresponding clusters based on the unified cluster structure. The experimental results show that (i) GMMSE works well on synthetic datasets and real datasets in the UCI machine learning repository. (ii) GMMSE outperforms most of the previous cluster ensemble approaches.
URI: http://hdl.handle.net/10397/9791
ISSN: 0020-0255
EISSN: 1872-6291
DOI: 10.1016/j.ins.2014.01.030
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

10
Last Week
1
Last month
0
Citations as of Dec 10, 2017

WEB OF SCIENCETM
Citations

9
Last Week
0
Last month
1
Citations as of Nov 7, 2017

Page view(s)

84
Last Week
1
Last month
Checked on Dec 10, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.