High-throughput sequencing methods now provide extensive data on disease-related human genetic variants. New methods are required to maximally utilize these data for enhanced understanding and treatment of human diseases. This dissertation describes my work in addressing three aspects of this challenge: Determining disease-causative variants; representing mechanisms by which genetic variant(s) cause disease phenotypes; and quantitatively analyzing genetic disease mechanisms.
First, I developed a variant prioritization algorithm, VarP, and objectively tested it in CAGI (Critical Assessment of Genome Interpretation). It was ranked best in the CAGI challenge on interpreting panel sequencing data for 106 patients, determining which disease class each patient has and the corresponding causative variant(s). VarP correctly identified the disease class for 36 cases, including 10 where the original clinical pipeline failed, and found seven cases with strong evidence of an alternative disease to that tested. Over-reliance on pathogenicity annotations in the HGMD mutation database led to several incorrect cases. Post analysis showed that protein structure data could have helped to interpret the impact of many prioritized missense variants.
Next, I co-developed and implemented MecCog, a web-based graphical framework to represent mechanisms by which genetic variants cause disease phenotypes. A MecCog mechanism schema displays the propagation of system perturbations across stages of biological organization, using graphical notations to symbolize perturbed entities and activities, knowledge gaps, ambiguities and uncertainties, and hyperlinked evidence. The web platform enables a user to construct, store, publish, browse, query, and comment on schemas. MecCog facilitates better comprehension of disease mechanisms, identification of critical unanswered questions on causal relationships, and possible new sites of therapeutic intervention.
Finally, I developed a framework to quantitatively represent and analyze mechanisms relating genetic variants to complex trait disease. It involves generating a computable circuit from MecCog schemas by assigning node functions and parameters to represent the behavior of the schema components. I demonstrate that such a circuit can be used to analyze the effect size of a variant contributing to disease risk as a function of the genetic background in an individual and the extent to which epistatic effects may be masked in population averages. I also show that the circuit functions and parameters can be learned in a data-driven manner using a hybrid neural network approach.
|Commitee:||Darden, Lindley, Elmqvist, Niklas, Mount, Stephen, Leiserson, Max|
|School:||University of Maryland, College Park|
|School Location:||United States -- Maryland|
|Source:||DAI-B 82/8(E), Dissertation Abstracts International|
|Subjects:||Bioinformatics, Computer science, Genetics|
|Keywords:||Disease mechanism, disease mechanism circuit, genetic variant, knowledge representation, mechanism architecture neural network, variant prioritization|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be