Classification of heterogeneous gene expression data

Fung, Yiu-ming

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85453

DC Field	Value	Language
dc.contributor	Department of Computing	-
dc.creator	Fung, Yiu-ming	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/4022	-
dc.language.iso	English	-
dc.title	Classification of heterogeneous gene expression data	-
dc.type	Thesis	-
dcterms.abstract	The introduction of DNA microarrays technology is a breakthrough technology to identification of cancer types by examining the difference of gene expression levels between normal and cancer tissues in various cancer types. This technology can have significant contribution to cancer study since morphologically similar, but molecularly different, tumors can now be classified by their gene expression level differences. However, reliable and robust classification performance must be guaranteed. This can be achieved by validating classification algorithms using heterogeneous gene expression data since these data consist of two types of variations, which are variations in available microarray technologies and in different expression levels of significant genes in various cancer types. Classification algorithms, which produce reliable and robust performance when using heterogeneous gene expression data, are less sensitive to these variations. In this dissertation, we first develop the Impact Factor (IF) to measure interexperimental variations caused by the variations in microarray technologies between two data sets. The IF is then integrated into common classifiers, such as k-nearest neighbor classifiers, for classification of heterogeneous gene expression data. Furthermore, we also develop the Majority-voting with Impact Factors (MIF) algorithm, which makes use of the IF, the majority-voting classification algorithm, and the uniform histogram partitioning technique, to perform multi-type, heterogeneous cancer gene expression data classification. In order to demonstrate the reliability and robustness of the IF measure and MIF algorithm, 10 different data sets, which are published in 7 publications and conducted by different microarray technologies under various experimental settings and conditions, are experimented. The experimental results show good classification performance in terms of classification measurements of accuracy, sensitivity and specificity. For the MIF algorithm, we have also compared our results with other researchers' work. The comparisons show performance enhancement. In addition, a meta-classification algorithm using voting technique - bagging - is also compared for further performance evaluation. Surprisingly, the application of bagging does not have significant performance improvement, while the MIF algorithms also perform better performance.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	M.Phil.	-
dcterms.extent	ix, 123 leaves : ill. ; 30 cm	-
dcterms.issued	2005	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
dcterms.LCSH	Gene expression	-
dcterms.LCSH	DNA microarrays	-
dcterms.LCSH	Genomics	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/4022

Show simple item record

Page views

188

Last Week
4

Last month

Citations as of Dec 7, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM