Exponential growth in genomic data poses unique challenges for life science researchers. New solutions that increase efficiency and accuracy are required for the problems of characterizing the functions and interacting partners of proteins and nucleic acid sequences. In order to expedite laboratory methods which are often time-consuming and costly, new developments in high-throughput computational approaches have emerged.
In this dissertation, computational solutions to problems such as protein functional classification, protein-protein interaction prediction, and regulatory sequence prediction are investigated. For protein functional classification, I have developed a supervised machine learning scheme which combines phylogenetic profiling, phylogenetic trees and a new mapping kernel into an iterative transductive support vector machine. This method is able to classify proteins more accurately than previous state-of-the-art methods. Regarding protein-protein interaction prediction, I have developed a novel mechanism to incorporate phylogenetic tree information to measure the degree of co-evolution between interacting proteins. I have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, whereas previous methods are mostly focused on the inter-matrix correlations of these same distance matrices. Both unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intra-matrix correlations. Particularly in the supervised case, a better balance between sensitivity and specificity in the prediction of protein-protein interactions is achieved. For protein functional linkage, an improved method for inferring functional linkage based on residue level co-evolutionary information has been developed. Comparison of residue phylogenetic vectors using correlation coefficient as a measure for similarity has been shown to perform better than previous results. In the realm of regulatory sequence prediction, I have constructed another supervised learning algorithm to take into consideration long-range correlations and global structural motifs within antisense oligomeric RNA and its binding partners. Previous methods almost always relied on local sequence information in these strands. Efficacy prediction was improved in this case as well.
Overall, the investigations resulted in new algorithms that improved both functional prediction and interaction prediction of protein and regulatory nucleic acid sequences. New directions for future research have also been generated as a result of this work.
|Commitee:||Decker, Keith, Meyers, Blake, Shanker, Vijay|
|School:||University of Delaware|
|Department:||Department of Computer and Information Sciences|
|School Location:||United States -- Delaware|
|Source:||DAI-B 72/05, Dissertation Abstracts International|
|Subjects:||Bioinformatics, Computer science|
|Keywords:||Comparative genomics, Phylogenetic profiles, Ppi prediction, Protein function prediction, Protein-protein interaction prediction, Support vector machines|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be