Please use this identifier to cite or link to this item:
Title: Analysis on protein-DNA interaction and gene expression
Authors: Zhou, Jiyun
Advisors: Lu, Qin (COMP)
Keywords: DNA-protein interactions
Gene expression
Issue Date: 2019
Publisher: The Hong Kong Polytechnic University
Abstract: Gene expression is pivotal in genomic biology. As experimental methods for gene expression prediction are costly and labor-consuming, there is an urgent to develop high-performance computational methods for gene expression predictions. As gene expressions are mainly regulated by interactions between DNAs and transcription factors (TFs) which is a type of proteins with special function, analysis on TF-DNA interactions may facilitate the prediction of gene expressions. This thesis focuses on the analysis of protein-DNA interactions and gene expression. We attempt to address issues in four aspects in gene expression analysis including (1) protein second structure prediction, (2) DNA binding residue prediction, (3) TF binding site (TFBS) prediction and (4) gene expression prediction. Our contribution mainly consists of four parts. For protein second structure prediction, we present a novel deep learning based prediction method, referred to as CNNH_PSS, which uses a multi-scale CNN with highway to capture both local context and longer-range dependencies. In CNNH_PSS, a specifc part of the information is delivered from a current layer to the output of the next one by highways to keep local context and the other parts of information are delivered from current layer to the input of the next one to capture dependencies among residues with longer distance. Therefore, the feature space learned by CNNH_PSS contains both local context and long-range interdependencies.
For DNA-binding residue prediction, the research goal is to learn relationships among residues for the prediction of DNA-binding residues. In this thesis, four prediction methods are proposed to learn relationships among residues. The first method applies PSSM (Position Specifc Score Matrix) distance transformation to encode local pairwise relationships between neighboring residues. The second method applies Convolutional Neural Network to learn relationships among several neighboring residues. The third method applies Long Short-Term Memory to learn both local relationships and long-range relationships among residues. The last method makes use of two sliding windows to learn sequence relationships and structure relationships, respectively. For TF-binding site (TFBS) prediction, three prediction methods are proposed. First, a novel method is proposed to capture higher order relationships among nucleotides by applying two CNNs on histone modifcations and DNA sequence, respectively. Second, a multi-task framework. is proposed to particular address data sparseness issue by leveraging on cross-cell-type information available. The method learns common features from multiple cell-types using a shared CNN and individual features by a private CNN for each cell-type. The last method is proposed for for the cross-TF TFBS prediction by learning TFBSs from other TFs in the training set. This method can further address the non-available issue in the current training data. Current gene expression prediction methods can only be used for cell-types or tissues in which ChIP-seq datasets for most important TFs are labeled. However, for most cell-types or tissues in human beings, the ChIP-seq datasets for most TFs are not available. In this work, a novel prediction method is proposed to first predict TFBSs by our cross-cell-type prediction method and the cross-TF prediction method. They are then combined with histone modifcations to learn feature representations for genes. The advantage of this method is that it predict gene expressions for any cell-type regardless of the availability of the TFBS of the considered TFs. Our proposed method can automatically extract combinatorial relationships among histone modifcations and TFBSs. These relationships and TFBSs play very important roles in regulating gene expression and facilitate the understanding of gene expression regulation for humans.
Description: xviii, 229 pages : color illustrations
PolyU Library Call No.: [THS] LG51 .H577P COMP 2019 Zhou
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
991022210744403411_link.htmFor PolyU Users167 BHTMLView/Open
991022210744403411_pira.pdfFor All Users (Non-printable)5.65 MBAdobe PDFView/Open
Show full item record
PIRA download icon_1.1View/Download Contents

Page view(s)

Citations as of May 21, 2019


Citations as of May 21, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.