Back to results list
Please use this identifier to cite or link to this item:
|Title:||A genetic algorithm based approach for clustering categorical data||Authors:||Lee, Ho-kei Sean||Keywords:||Cluster analysis -- Data processing
Hong Kong Polytechnic University -- Dissertations
|Issue Date:||2006||Publisher:||The Hong Kong Polytechnic University||Abstract:||Given a database of records, clustering is concerned with the grouping of similar records into different groups or clusters based on their attribute values. Many algorithms have been proposed in the past to address the clustering problem but most of them are developed mainly to handle continuous-valued data. Relatively little attention has been paid to the clustering of categorical data. Given that these kind of data is very commonly collected in many applications in business, medicine and the social sciences, etc., it is important that an effective clustering algorithm be developed to handle such data, in this thesis, we propose such an algorithm. This algorithm is based on the use of a simple genetic algorithm (GA) that employs a probabilistic search technique for solutions that are supposedly optimal or near-optimal according to some performance criteria. This GA-based clustering algorithm makes use of an encoding scheme that can encode clustering results in chromosomes effectively. To work with this scheme, we also propose a set of genetic operators that can facilitate the exchange of clustering information between chromosomes on one hand and allow variations to be introduced on the other. For the proposed GA to work well, we have also introduced a fitness function to evaluate clustering quality. This is based on an information theoretic measure that measures how much the presence of a particular attribute value supports or refutes a record in a data set to be classified into a specific cluster. The higher its fitness value based on the evaluation function, the better the solution encoded in a chromosome. Unlike traditional algorithm, the proposed GA-based clustering algorithm has the advantage that it can automatically determine the number of clusters hidden in a dataset. The proposed algorithm has been tested with both simulated and real data; the results show that it is very promising and can have many real applications.||Description:||vii, 103 leaves : ill. ; 31 cm.
PolyU Library Call No.: [THS] LG51 .H577M COMP 2006 Lee
|URI:||http://hdl.handle.net/10397/909||Rights:||All rights reserved.|
|Appears in Collections:||Thesis|
Show full item record
Files in This Item:
|b20697260_link.htm||For PolyU Users||166 B||HTML||View/Open|
|b20697260_ir.pdf||For All Users (Non-printable)||1.4 MB||Adobe PDF||View/Open|
Citations as of Jun 18, 2018
Citations as of Jun 18, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.