Please use this identifier to cite or link to this item:
Title: Clustering and re-clustering for pattern discovery in gene expression data
Authors: Ma, PCH
Chan, KCC 
Chiu, DKY
Keywords: Bioinformatics
Cluster analysis
Gene expression data analysis
Pattern recognition
Transcription factor binding sites
Issue Date: 2005
Source: Journal of bioinformatics and computational biology, 2005, v. 3, no. 2, p. 281-301 How to cite?
Journal: Journal of Bioinformatics and Computational Biology 
Abstract: The combined interpretation of gene expression data and gene sequences is important for the investigation of the intricate relationships of gene expression at the transcription level. The expression data produced by microarray hybridization experiments can lead to the identification of clusters of co-expressed genes that are likely co-regulated by the same regulatory mechanisms. By analyzing the promoter regions of co-expressed genes, the common regulatory patterns characterized by transcription factor binding sites can be revealed. Many clustering algorithms have been used to uncover inherent clusters in gene expression data. In this paper, based on experiments using simulated and real data, we show that the performance of these algorithms could be further improved. For the clustering of expression data typically characterized by a lot of noise, we propose to use a two-phase clustering algorithm consisting of an initial clustering phase and a second re-clustering phase. The proposed algorithm has several desirable features: (i) it utilizes both local and global information by computing both a "local" pairwise distance between two gene expression profiles in Phase 1 and a "global" probabilistic measure of interestingness of cluster patterns in Phase 2, (ii) it distinguishes between relevant and irrelevant expression values when performing re-clustering, and (iii) it makes explicit the patterns discovered in each cluster for possible interpretations. Experimental results show that the proposed algorithm can be an effective algorithm for discovering clusters in the presence of very noisy data. The patterns that are discovered in each cluster are found to be meaningful and statistically significant, and cannot otherwise be easily discovered. Based on these discovered patterns, genes co-expressed under the same experimental conditions and range of expression levels have been identified and evaluated. When identifying regulatory patterns at the promoter regions of the co-expressed genes, we also discovered well-known transcription factor binding sites in them. These binding sites can provide explanations for the co-expressed patterns.
ISSN: 0219-7200
DOI: 10.1142/S0219720005001053
Appears in Collections:Journal/Magazine Article

View full-text via PolyU eLinks SFX Query
Show full item record


Last Week
Last month
Citations as of Aug 15, 2018

Page view(s)

Last Week
Last month
Citations as of Aug 13, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.