It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.
In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at http://complex.stanford.edu.
Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.
Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.
Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.
|School Location:||United States -- California|
|Source:||DAI-B 75/09(E), Dissertation Abstracts International|
|Subjects:||Genetics, Bioinformatics, Computer science|
|Keywords:||Comparative genomics, Gene cis-regulation, Non-coding DNA, Personal genomics, Regulatory function prediction, Transcription factor complexes|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be