Dissertation/Thesis Abstract

Computational approaches to study the relation between genomic variations and phenotypes
by Huang, Hailiang, Ph.D., The Johns Hopkins University, 2012, 212; 3528554
Abstract (Summary)

Recent developments in high throughput biology have enabled the systematic exploration of the relation between genomic variants and phenotypes. The immense amount of data generated from the high throughput experiments, however, poses challenges to researchers. New statistical and computational approaches are desired to use the data efficiently to draw biological meaningful conclusions. In this thesis research, we developed new methods to take advantage of high-throughput biological data to tackle important problems including analyzing genome-wide association studies between human genomic variants and human phenotypes, finding co-complexed proteins from protein interaction networks and estimating the false-positive and false negative rates of two-hybrid protein-protein interaction screens. We also present a database designed to compile and perform preliminary analyses of yeast histone systematic mutations.

The new gene-based association test that we have developed has improved power compared to previous methods because it merges multiple weak associations within a gene into a stronger combined signal. Application of the new approach to ECG traits recovered two more genome-wide significant loci, in addition to the four genome-wide significant loci identified by traditional methods. The two new findings were validated in a meta-analysis using a larger population. Protein complexes are basic functional units in biological processes. Finding proteins that reside in the same complex can provide important information for understanding disease mechanisms. We reviewed current methods and proposed new methods to find co-complex proteins from 'seed' proteins using confidence-weighted protein physical interaction networks. We systematically evaluated all approaches and explored the effects of different confidence metrics on their performances.

To provide information to improve the protein physical interaction network, we extended capture-recapture theory to estimate protein-specific false-positive and false-negative rates in yeast two-hybrid screens. Analysis of yeast, worm and fly protein-protein interaction data indicated that 25% to 45% of the reported interactions are likely false positives. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical under-sampling.

Histones are the basic protein components of nucleosomes. They are among the most conserved proteins and are subject to a plethora of post-translational modifications. We designed a database for histone systematic mutations. This database combines histone phenotypes with information about sequences, structures, post-translational modifications and evolutionary conservation. Preliminary analyses confirm that mutations at highly conserved residues and modifiable residues are more likely to generate phenotypes.

Indexing (document details)
Advisor: Bader, Joel S.
Commitee:
School: The Johns Hopkins University
School Location: United States -- Maryland
Source: DAI-B 74/01(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Systematic, Biomedical engineering, Bioinformatics
Keywords: Biological networks, Genomic variations, Histones
Publication Number: 3528554
ISBN: 9781267614025
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest