Cell state transitions are tightly controlled by numerous regulatory mechanisms to achieve cellular differentiation. Dysregulation of these regulatory mechanisms through the acquisition of somatic mutations and/or copy number changes can lead to oncogenic transformation. Binding of transcription factors (TFs) to regulatory elements is a primary mechanism controlling gene expression. TFs work in conjunction with chromatin to either activate or repress specific genes. miRNA-mediated degradation is another key regulatory mechanism involved in post transcriptional repression of genes. Genomics projects like ENCODE, Roadmap Epigenomics, TCGA and others are generating rich datasets across cell lines, primary tissues and cancers. These datasets enable computational modeling of transcriptional and miRNA mediated regulation. In this thesis, I will present our work on integrating multimodal datasets along with DNA sequence information to decipher novel regulatory programs in human disease and differentiation.
First, we use the TCGA generated GBM dataset as a case study to infer gene regulatory programs in disease. We model the gene expression change in GBM relative to normal brain as a function of copy number of the gene, and TF and miRNA binding sites in the promoter and 3'UTR respectively. We use regularized least squares regression to fit the expression change of all genes for each sample. This framework achieves significant accuracy compared to randomized gene expression values and clustering of regression models recapitulates expression subtypes. We then employ a multi-task learning framework to learn regression models of all samples simultaneously and define a feature-scoring scheme to identify subtype-specific and common regulators. Using these experiments and literature search, we were able to identify a core regulatory network centered at the REST repression complex in the proneural subtype of GBM.
I will then present our work on characterizing regulatory changes in hematopoietic differentiation primarily using DNase-seq enhancer maps from the Roadmap Epigenomics project. We first developed a tool, SeqGL, which demonstrates significantly greater sensitivity to binding signals underlying enhancer maps compared to traditional motif discovery algorithms. We then characterize the locus complexity, defined as number of DNase peaks assigned to a gene, in the hematopoietic system and observe that high complexity genes tend to be cell-type specific in expression and are enriched for functionally relevant ontologies. Furthermore, we observe extensive poising of enhancers in progenitor cells for function in differentiated cell types. We then use SeqGL scores to predict gene expression change in a transition from stem and progenitor cells to differentiated cell types with high accuracy and identify a potentially novel mechanistic role for PU.1 in B cell and monocyte specification.
|School:||Weill Medical College of Cornell University|
|School Location:||United States -- New York|
|Source:||DAI-B 76/01(E), Dissertation Abstracts International|
|Keywords:||Computational Biology, Gene Regulation, Hematopoietic System, Machine Learning, PU.1 Role, Transcriptional Regulation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be