Recent developments in high throughput biology have enabled the systematic exploration of the relation between genomic variants and phenotypes. The immense amount of data generated from the high throughput experiments, however, poses challenges to researchers. New statistical and computational approaches are desired to use the data efficiently to draw biological meaningful conclusions. In this thesis research, we developed new methods to take advantage of high-throughput biological data to tackle important problems including analyzing genome-wide association studies between human genomic variants and human phenotypes, finding co-complexed proteins from protein interaction networks and estimating the false-positive and false negative rates of two-hybrid protein-protein interaction screens. We also present a database designed to compile and perform preliminary analyses of yeast histone systematic mutations.
The new gene-based association test that we have developed has improved power compared to previous methods because it merges multiple weak associations within a gene into a stronger combined signal. Application of the new approach to ECG traits recovered two more genome-wide significant loci, in addition to the four genome-wide significant loci identified by traditional methods. The two new findings were validated in a meta-analysis using a larger population. Protein complexes are basic functional units in biological processes. Finding proteins that reside in the same complex can provide important information for understanding disease mechanisms. We reviewed current methods and proposed new methods to find co-complex proteins from 'seed' proteins using confidence-weighted protein physical interaction networks. We systematically evaluated all approaches and explored the effects of different confidence metrics on their performances.
To provide information to improve the protein physical interaction network, we extended capture-recapture theory to estimate protein-specific false-positive and false-negative rates in yeast two-hybrid screens. Analysis of yeast, worm and fly protein-protein interaction data indicated that 25% to 45% of the reported interactions are likely false positives. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical under-sampling.
Histones are the basic protein components of nucleosomes. They are among the most conserved proteins and are subject to a plethora of post-translational modifications. We designed a database for histone systematic mutations. This database combines histone phenotypes with information about sequences, structures, post-translational modifications and evolutionary conservation. Preliminary analyses confirm that mutations at highly conserved residues and modifiable residues are more likely to generate phenotypes.
|Advisor:||Bader, Joel S.|
|School:||The Johns Hopkins University|
|School Location:||United States -- Maryland|
|Source:||DAI-B 74/01(E), Dissertation Abstracts International|
|Subjects:||Systematic, Biomedical engineering, Bioinformatics|
|Keywords:||Biological networks, Genomic variations, Histones|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be