Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/3446
Title: Mining structural patterns in biological networks
Authors: Lam, Wai-man
Keywords: Hong Kong Polytechnic University -- Dissertations
Bioinformatics
Computational biology
Data mining
Graphic methods
Issue Date: 2010
Publisher: The Hong Kong Polytechnic University
Abstract: Biological networks capture information about the way biomolecules, such as genes, proteins, and metabolites, interact with each other. Discovering interesting patterns in them will enable one to better understand biological processes such as cellular organization, transcription regulation and phenotypic evolution, etc. Biological networks can be modeled as graphs with vertices representing biomolecules and edges representing the interactions between them. Graph mining algorithms have been used to find frequently-occurring subgraphs in biological graphs. As these subgraphs may only be "overrepresented patterns", these algorithms are sometimes not considered very useful. What is needed is an algorithm that can be used to find not only frequently-occurring patterns, but patterns that can actually characterize biological networks and allow them to be discriminated from each other. For many biological networks, other than the name of each of their constituent biomolecules, a number of other attributes are usually also known about them. For example, other than the name of each protein in a PPI network, we also know, for many of the proteins, the functions they perform, the cellular processes they are involved in, etc. Proteins always perform more than one molecular function and are involved in multiple cellular processes [67]. The information provided by all these additional attributes are currently not taken into consideration by graph mining algorithms even though they can be very useful. To take into considerations the multiple attributes of the constituent biomolecules, we model the biological network as a multiple-attribute graph using gene ontology to allow more information, other than direct interactions between biomolecules, to be used in the graph mining process. The multiple-attribute graph representation allows vertices to not only represent biomolecules but also the attributes that associate with them. Subgraphs in a multiple-attribute graph may relate to each other and if a node is used to represent a subgraph, hierarchical multiple attribute graph can also be formed and mined for patterns. In this thesis, we propose a graph mining algorithm that can be used to discover interesting patterns in such graphs. The algorithm is called MISPAG (Mining Interesting Structural Patterns in Attributed Graphs). MISPAG is able to discover interesting subgraphs using an interestingness measure that can be used to determine if a certain subgraph occurs more, or less, frequently in a graph than expected. The interestingness measure can take into consideration the multiple attributes of the constituent biomolecules of a biological network and can be used to filter out subgraphs that do not contribute to the unique characterization and discrimination of a network or a class of networks even if they occur frequently according to some user threshold. MISPAG can be modified as different algorithms that suitable to solve such problems as motif discovery, network identification, protein function prediction, molecular classification, and protein complexes discovery. These algorithms have been implemented and tested with real biological data in different application areas. Experimental results show that our proposed algorithms can effectively uncover patterns that are biologically meaningful for the deciphering of the biological and structural relationships in the networks, and for the prediction of un-annotated functions and features of proteins, genes, and chemical compounds.
Description: vii, 196 leaves : ill. ; 31 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2010 Lam
URI: http://hdl.handle.net/10397/3446
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b2374490x_link.htmFor PolyU Users 162 BHTMLView/Open
b2374490x_ir.pdfFor All Users (Non-printable) 5.81 MBAdobe PDFView/Open
Show full item record

Page view(s)

533
Last Week
3
Last month
Checked on Apr 23, 2017

Download(s)

1,370
Checked on Apr 23, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.