Please use this identifier to cite or link to this item:
Title: Mining structural patterns in biological networks
Authors: Lam, Wai-man
Degree: Ph.D.
Issue Date: 2010
Abstract: Biological networks capture information about the way biomolecules, such as genes, proteins, and metabolites, interact with each other. Discovering interesting patterns in them will enable one to better understand biological processes such as cellular organization, transcription regulation and phenotypic evolution, etc. Biological networks can be modeled as graphs with vertices representing biomolecules and edges representing the interactions between them. Graph mining algorithms have been used to find frequently-occurring subgraphs in biological graphs. As these subgraphs may only be "overrepresented patterns", these algorithms are sometimes not considered very useful. What is needed is an algorithm that can be used to find not only frequently-occurring patterns, but patterns that can actually characterize biological networks and allow them to be discriminated from each other. For many biological networks, other than the name of each of their constituent biomolecules, a number of other attributes are usually also known about them. For example, other than the name of each protein in a PPI network, we also know, for many of the proteins, the functions they perform, the cellular processes they are involved in, etc. Proteins always perform more than one molecular function and are involved in multiple cellular processes [67]. The information provided by all these additional attributes are currently not taken into consideration by graph mining algorithms even though they can be very useful. To take into considerations the multiple attributes of the constituent biomolecules, we model the biological network as a multiple-attribute graph using gene ontology to allow more information, other than direct interactions between biomolecules, to be used in the graph mining process. The multiple-attribute graph representation allows vertices to not only represent biomolecules but also the attributes that associate with them. Subgraphs in a multiple-attribute graph may relate to each other and if a node is used to represent a subgraph, hierarchical multiple attribute graph can also be formed and mined for patterns. In this thesis, we propose a graph mining algorithm that can be used to discover interesting patterns in such graphs. The algorithm is called MISPAG (Mining Interesting Structural Patterns in Attributed Graphs). MISPAG is able to discover interesting subgraphs using an interestingness measure that can be used to determine if a certain subgraph occurs more, or less, frequently in a graph than expected. The interestingness measure can take into consideration the multiple attributes of the constituent biomolecules of a biological network and can be used to filter out subgraphs that do not contribute to the unique characterization and discrimination of a network or a class of networks even if they occur frequently according to some user threshold. MISPAG can be modified as different algorithms that suitable to solve such problems as motif discovery, network identification, protein function prediction, molecular classification, and protein complexes discovery. These algorithms have been implemented and tested with real biological data in different application areas. Experimental results show that our proposed algorithms can effectively uncover patterns that are biologically meaningful for the deciphering of the biological and structural relationships in the networks, and for the prediction of un-annotated functions and features of proteins, genes, and chemical compounds.
Subjects: Hong Kong Polytechnic University -- Dissertations
Computational biology
Data mining
Graphic methods
Pages: vii, 196 leaves : ill. ; 31 cm.
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of May 28, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.