Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/89749
Title: Link prediction in microrna-mediated biomolecular networks
Authors: Huang, Yuan
Degree: Ph.D.
Issue Date: 2020
Abstract: Many problems in the real-world can be formulated as discovering the existence of relationship between objects in a set of inter-related objects. For example, in molecular biology, it is known that microRNA and human diseases are related as they may interact with each other. While the existence of interaction relationship between some of them may be known, the existence of some others may not. One problem is, therefore, for the existence of interaction relationship between a microRNA and a human disease to be determined based on known relationship between other microRNAs and human diseases. If we represent microRNAs and human diseases as nodes in a network, then the links between them can be used to represent their interaction relationship, we have a biomolecular network. Given such a network, we can then define a link prediction problem as the prediction of missing links in the network based on existing links. In this thesis, we tackle the link prediction problem of three kinds of biomolecular networks that involve mediated microRNA. Specifically, we predict three types of interaction relationships between microRNA and three other different objects: (i) complex human diseases, (ii) drug resistance and (iii) lncRNA. Based on known interaction data obtained from public databases, we construct microRNA-mediated biomolecular networks containing nodes and unweighted links. The nodes are of two types. One type represents microRNA and the other represents either diseases, drug resistance or lncRNA. The links between these different types of nodes represent interaction relationship between the two types of objects. Given the biomolecular networks, our problems are to use the known links to predict the missing ones in the networks.
In the datasets we collected, known interaction data are often limited in number. To improve the prediction performance, in addition to the known links, we introduce node information data that are biologically relevant to the objects that the nodes represent for link prediction. These data can be related to the biological or physicochemical properties of the objects. They can be concerned with expression profiles, drug structural data, RNA sequences, etc., and their data types can be very different. For example, when predicting links in microRNA-disease association network, the data we use to characterize the node of microRNAs can be another network -- the lncRNA-microRNA interaction network. When predicting the links in microRNA-drug resistance association network, the data we use to characterize the nodes of drugs and microRNAs are high-dimensional numerical features. When predicting the links in microRNA-lncRNA interaction network, the data we use to characterize the nodes of microRNA and lncRNA are network multiple similarity matrixes. The main challenges of our research, therefore, lie in finding ways to introduce these different kinds of node information during the prediction process. To overcome these difficulties, we propose four different algorithms that can each effectively tackle different challenges. Specifically, to predict associations between microRNA and diseases, MVMTMDA algorithm considers the data incompleteness of lncRNA-microRNA interactions. It formulates the prediction task as a multi-task problem, in which the links of lncRNA-microRNA interaction and microRNA-disease association are simultaneously predicted, and adopts multi-view learning to learn the embedding of microRNA nodes from two networks. When predicting the associations between microRNA and drug resistance, the nodes have attributes whose dimensions are up to thousands, which is extremely high. GCMDR algorithm used a spectral graph convolution technique to solve this problem. The deep neural network structure it adopts can be applied to high dimensional node numerical features, allowing an end-to-end prediction without any data preprocessing process. Different from other prediction tools for microRNA targets that are based on sequence matching, EPLMI algorithm for the first time, reformulates the lncRNA-microRNA interaction prediction task as a link prediction problem and adopts a two-way diffusion method to perform prediction. To improve the prediction performance of EPLMI, we further propose LMNLMI algorithm which use a similarity network fusion technique to collectively consider multiple types of lncRNA/microRNA similarity. The proposed algorithms have been applied on real-world datasets that we collected from the public databases. The experimental results illustrate our proposed models are accurate, efficient, robust to parameter settings and outperform state-of-the-art approaches.
Subjects: MicroRNA
Bioinformatics
Computational biology
Data mining
Hong Kong Polytechnic University -- Dissertations
Pages: xiv, 124 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

40
Last Week
0
Last month
Citations as of Apr 28, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.