Please use this identifier to cite or link to this item:
Title: Classification of heterogeneous gene expression data
Authors: Fung, Yiu-ming
Degree: M.Phil.
Issue Date: 2005
Abstract: The introduction of DNA microarrays technology is a breakthrough technology to identification of cancer types by examining the difference of gene expression levels between normal and cancer tissues in various cancer types. This technology can have significant contribution to cancer study since morphologically similar, but molecularly different, tumors can now be classified by their gene expression level differences. However, reliable and robust classification performance must be guaranteed. This can be achieved by validating classification algorithms using heterogeneous gene expression data since these data consist of two types of variations, which are variations in available microarray technologies and in different expression levels of significant genes in various cancer types. Classification algorithms, which produce reliable and robust performance when using heterogeneous gene expression data, are less sensitive to these variations. In this dissertation, we first develop the Impact Factor (IF) to measure interexperimental variations caused by the variations in microarray technologies between two data sets. The IF is then integrated into common classifiers, such as k-nearest neighbor classifiers, for classification of heterogeneous gene expression data. Furthermore, we also develop the Majority-voting with Impact Factors (MIF) algorithm, which makes use of the IF, the majority-voting classification algorithm, and the uniform histogram partitioning technique, to perform multi-type, heterogeneous cancer gene expression data classification. In order to demonstrate the reliability and robustness of the IF measure and MIF algorithm, 10 different data sets, which are published in 7 publications and conducted by different microarray technologies under various experimental settings and conditions, are experimented. The experimental results show good classification performance in terms of classification measurements of accuracy, sensitivity and specificity. For the MIF algorithm, we have also compared our results with other researchers' work. The comparisons show performance enhancement. In addition, a meta-classification algorithm using voting technique - bagging - is also compared for further performance evaluation. Surprisingly, the application of bagging does not have significant performance improvement, while the MIF algorithms also perform better performance.
Subjects: Hong Kong Polytechnic University -- Dissertations
Gene expression
DNA microarrays
Pages: ix, 123 leaves : ill. ; 30 cm
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of May 28, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.