Effective techniques for gene expression data mining

Ma, Chi-hung Patrick

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/84514

Title:	Effective techniques for gene expression data mining
Authors:	Ma, Chi-hung Patrick
Degree:	Ph.D.
Issue Date:	2006
Abstract:	Gene expression data mining as a new research area poses new challenges to data mining researchers. Gene expression data are typically very noisy and have very high dimensionality. To tackle bioinformatics problems involving them, traditional data mining techniques may not be the best tools to use as they were not originally developed to deal with such data. For this reason, new effective techniques are required. In this thesis, we propose some such techniques. In particular, these techniques can be used to address the problems of reconstructing gene regulatory networks and clustering gene expression data. The former is concerned with the problem of discovering gene interactions to infer the structures of gene regulatory networks. The latter is concerned with the problem of discovering clusters of co-expressed genes so that genes that have similar expression patterns under different experimental conditions can be identified. To reconstruct gene regulatory networks, we have proposed to use an association-discovery technique, which is based on residual analysis and an information theoretic measure, to detect whether or not there interesting association relationships between genes. Given time-dependent gene expression data, this technique can reveal interesting sequential associations between genes for the effective inference of the structures of gene regulatory networks. The association-discovery technique proposed can also be used to find interesting association relationships between gene expression levels and cluster labels. Based on discovering such relationships, we have developed a two-phase clustering algorithm for gene expression data. This algorithm consists of an initial clustering phase and a second re-clustering phase. Using this two-phase approach, it is able to group genes, whose cluster memberships cannot be easily determined by existing methods, into the appropriate clusters. Since the effectiveness of the two-phase clustering algorithm depends, to some extent, on that of the existing clustering method used in the first phase, therefore, we have developed a novel evolutionary clustering algorithm, called EvoCluster, that can be used in the first phase to overcome some of the limitations of existing ones. By making use of an evolutionary approach and the association-discovery technique, it not only is able to perform well in the presence of very noisy data, it can also be used to discover overlapping clusters. For performance evaluation, the data mining techniques proposed in this thesis have been tested with simulated and real data and the experimental results show that they are very promising.
Subjects:	Hong Kong Polytechnic University -- Dissertations. Gene expression -- Data processing. Data mining.
Pages:	vii, 152 p. : ill. ; 30 cm.
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/223

Show full item record

Page views

56

Last Week
0

Last month

Citations as of Apr 21, 2024

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM