A mathematical model is an abstraction that distills quantifiable behaviors and properties into a well-defined formalism in order to learn or predict something about a system. Such models may be as light as pencil-and-paper calculations on the back of an envelope or as heavy as to entail modern super computers. They may be as simple as predicting the trajectory of a baseball or as complex as forecasting the weather. By using macromolecular protein structures as substrates, the objective of this thesis is to improve upon and leverage mathematical models in order to address what is both a growing challenge and a burgeoning opportunity in the age of next-generation sequencing. The rapidly growing volume of data being produced by emerging deep sequencing technologies is enabling more in-depth analyses of protein conservation than previously possible. Increasingly, deep sequencing is bringing to light many disease-associated loci and localized signatures of strong conservation. These signatures in sequence space are the "shadows" of selective pressures that have been acting on proteins over the course of many years. However, despite the rapidly growing abundance of available data on such signatures, as well as the finer resolution with which they may be detected, an intuitive biophysical or functional rationale behind such genomic shadows is often missing (such intuition may otherwise be provided, for instance, by the need to engage in protein-protein interactions, undergo post-translational modification, or achieve a close-packed hydrophobic core). Allostery may frequently provide the missing conceptual link. Allosteric mechanisms act through changes in the dynamic behavior of protein architectures. Because selective evolutionary pressures often act through processes that are intrinsically dynamic in nature, static renderings can fail to provide any plausible rationale for constraint. In the work outlined here, models of protein conformational change are used to predict allosteric residues that either a) act as essential cavities on the protein surface which serve as sources or sinks in allosteric communication; or b) function as important information flow bottlenecks within the allosteric communication pathways of the protein interior. Though most existing approaches entail computationally expensive methods (such as MD) or rely on less direct measures (such as sequence features), the framework discussed herein is simultaneously both computationally tractable and fundamentally structural in nature – conformational change and topology are directly included in the search for allosteric residues – thereby enabling allosteric site prediction across the Protein Data Bank. Large-scale (i.e., general) properties of the predicted allosteric residues are then evaluated with respect to conservation. Multiple threads of evidence (using different sources of data and employing a variety of metrics) are used to demonstrate that the predicted allosteric residues tend to be significantly conserved across diverse evolutionary time scales. In addition, specific examples in which these residues can help to explain previously poorly understood disease-associated variants are discussed. Finally, a practical and computationally rapid software tool that enables users to perform this analysis on their own proteins of interest has been made available to the scientific public.
|School Location:||United States -- Connecticut|
|Source:||DAI-B 78/01(E), Dissertation Abstracts International|
|Subjects:||Applied Mathematics, Bioinformatics, Biophysics|
|Keywords:||Allostery, Mathematical Models, Next-Generation Sequencing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be