The dysregulated activity of oncogenic transcription factors contributes to neoplastic transformation by promoting aberrant expression of target genes involved in regulating cell homeostasis. Therefore, characterization of the regulatory networks controlled by these transcription factors is a critical objective in understanding the molecular mechanisms of cell transformation. Modern high throughput technologies are providing the first window into regulatory processes on the genome-scale, foretelling the ability of computational inference algorithms to produce models of regulatory networks that will revolutionize our understanding and treatment of cancer biology by (1) describing how genomic alterations cause functional disruptions in the network regulating cell homeostasis, leading to aberrant cell growth and cancer, and (2) predicting therapeutic interventions, in which critical components of the network can be targeted to revert the cancer phenotype.
This thesis will develop methods that advance the current state of the art in inferring transcriptional regulatory networks from high throughput data, with specific application to both gene expression and ChIP-on-chip data. Prior to this thesis, several methods had been proposed to infer regulatory networks from microarray data; however, these methods were applicable only to model organisms, such as yeast, due to high computational complexity. Moreover, all methods relied to some extent on various assumptions that are not biologically realistic. Here, I will develop a novel method, based on information theory, that overcomes these limitations in that it has low computational complexity, allowing application to mammalian systems, and makes minimal assumptions about the structure of the network or about the type of statistical interaction between genes (e.g. linear models). I will apply this method to reconstruct the first genome-wide regulatory network inferred from microarray data for mammalian cells, and further demonstrate how this method can be used to deduce regulatory interactions between subnetworks controlled by different oncogenes, using only microarray data. I will extend this analysis, again using the tools of information theory, to consider inference of interactions involving more than two variables. To do so, I provide a rigorous definition of statistical dependency in the multivariate setting, which previously had not been done. I demonstrate that this framework effectively identifies groups of genes that interact in a pathway to jointly regulate a common set of targets. While the microarray analysis methods are motivated by issues specific to inferring gene regulatory networks, the resulting algorithmic advances are novel from a purely mathematical/computational perspective, and should be generally applicable to reverse engineering networks from measurements of the interacting variables, which is a general problem both in other branches of systems biology (e.g. metabolic networks, neural networks), as well as scientific applications outside of systems biology (e.g. social networks, electrical networks).
In the second part of the thesis I consider analysis of ChIP-on-chip experiments, which is a new technology that more directly measures transcription factor-chromatin interactions. I show that existing methods to analyze these data are not able to assign meaningful statistical significance scores (p-values) to bound promoters, due to a number of flawed assumptions. I then develop a data driven method that accurately predicts the extent of TF/DNA binding, and reveals an order of magnitude more interactions than previous methods. When combined with DNA sequence and gene expression data, I will demonstrate how application of this method can deduce regulatory networks of substantially greater complexity than previously appreciated. Moreover, I use this method to analyze the interaction between regulatory networks controlled by two important proto-oncogenes (MYC and NOTCH1), which were predicted to be statistically significantly overlapping by the gene expression-based analysis of the first section. This analysis reveals that these networks are in fact virtually completely overlapping, with MYC and NOTCH1 jointly regulating several thousand targets.
Much additional work must be done in this new field, both computationally and technologically, to reach the goal of building predictive models able to describe the connection between genomic alterations and malignancies such as cancer. However, this thesis takes steps in this direction by developing computational methods to leverage cutting-edge genome-wide measurement technologies to understand the regulatory networks controlling cellular function and homeostasis. The resulting systems-level view of transcriptional regulation already reveals fundamentally more complexity than previously anticipated, altering the traditional view of genetic regulatory networks.
|School Location:||United States -- New York|
|Source:||DAI-B 69/01, Dissertation Abstracts International|
|Subjects:||Genetics, Bioinformatics, Artificial intelligence|
|Keywords:||Cancer cells, Computational inference, Genetic regulatory networks, Machine learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be