Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85008
Title: Analysis on protein-DNA interaction and gene expression
Authors: Zhou, Jiyun
Degree: Ph.D.
Issue Date: 2019
Abstract: Gene expression is pivotal in genomic biology. As experimental methods for gene expression prediction are costly and labor-consuming, there is an urgent to develop high-performance computational methods for gene expression predictions. As gene expressions are mainly regulated by interactions between DNAs and transcription factors (TFs) which is a type of proteins with special function, analysis on TF-DNA interactions may facilitate the prediction of gene expressions. This thesis focuses on the analysis of protein-DNA interactions and gene expression. We attempt to address issues in four aspects in gene expression analysis including (1) protein second structure prediction, (2) DNA binding residue prediction, (3) TF binding site (TFBS) prediction and (4) gene expression prediction. Our contribution mainly consists of four parts. For protein second structure prediction, we present a novel deep learning based prediction method, referred to as CNNH_PSS, which uses a multi-scale CNN with highway to capture both local context and longer-range dependencies. In CNNH_PSS, a specifc part of the information is delivered from a current layer to the output of the next one by highways to keep local context and the other parts of information are delivered from current layer to the input of the next one to capture dependencies among residues with longer distance. Therefore, the feature space learned by CNNH_PSS contains both local context and long-range interdependencies.
For DNA-binding residue prediction, the research goal is to learn relationships among residues for the prediction of DNA-binding residues. In this thesis, four prediction methods are proposed to learn relationships among residues. The first method applies PSSM (Position Specifc Score Matrix) distance transformation to encode local pairwise relationships between neighboring residues. The second method applies Convolutional Neural Network to learn relationships among several neighboring residues. The third method applies Long Short-Term Memory to learn both local relationships and long-range relationships among residues. The last method makes use of two sliding windows to learn sequence relationships and structure relationships, respectively. For TF-binding site (TFBS) prediction, three prediction methods are proposed. First, a novel method is proposed to capture higher order relationships among nucleotides by applying two CNNs on histone modifcations and DNA sequence, respectively. Second, a multi-task framework. is proposed to particular address data sparseness issue by leveraging on cross-cell-type information available. The method learns common features from multiple cell-types using a shared CNN and individual features by a private CNN for each cell-type. The last method is proposed for for the cross-TF TFBS prediction by learning TFBSs from other TFs in the training set. This method can further address the non-available issue in the current training data. Current gene expression prediction methods can only be used for cell-types or tissues in which ChIP-seq datasets for most important TFs are labeled. However, for most cell-types or tissues in human beings, the ChIP-seq datasets for most TFs are not available. In this work, a novel prediction method is proposed to first predict TFBSs by our cross-cell-type prediction method and the cross-TF prediction method. They are then combined with histone modifcations to learn feature representations for genes. The advantage of this method is that it predict gene expressions for any cell-type regardless of the availability of the TFBS of the considered TFs. Our proposed method can automatically extract combinatorial relationships among histone modifcations and TFBSs. These relationships and TFBSs play very important roles in regulating gene expression and facilitate the understanding of gene expression regulation for humans.
Subjects: Hong Kong Polytechnic University -- Dissertations
DNA-protein interactions
Gene expression
Pages: xviii, 229 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

39
Last Week
0
Last month
Citations as of Mar 24, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.