Dissertation/Thesis Abstract

Predicting and functions and interacting partners of biological sequences: Kernel methods using structural and phylogenetic information
by Craig, Roger A., Jr., Ph.D., University of Delaware, 2010, 158; 3444293
Abstract (Summary)

Exponential growth in genomic data poses unique challenges for life science researchers. New solutions that increase efficiency and accuracy are required for the problems of characterizing the functions and interacting partners of proteins and nucleic acid sequences. In order to expedite laboratory methods which are often time-consuming and costly, new developments in high-throughput computational approaches have emerged.

In this dissertation, computational solutions to problems such as protein functional classification, protein-protein interaction prediction, and regulatory sequence prediction are investigated. For protein functional classification, I have developed a supervised machine learning scheme which combines phylogenetic profiling, phylogenetic trees and a new mapping kernel into an iterative transductive support vector machine. This method is able to classify proteins more accurately than previous state-of-the-art methods. Regarding protein-protein interaction prediction, I have developed a novel mechanism to incorporate phylogenetic tree information to measure the degree of co-evolution between interacting proteins. I have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, whereas previous methods are mostly focused on the inter-matrix correlations of these same distance matrices. Both unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intra-matrix correlations. Particularly in the supervised case, a better balance between sensitivity and specificity in the prediction of protein-protein interactions is achieved. For protein functional linkage, an improved method for inferring functional linkage based on residue level co-evolutionary information has been developed. Comparison of residue phylogenetic vectors using correlation coefficient as a measure for similarity has been shown to perform better than previous results. In the realm of regulatory sequence prediction, I have constructed another supervised learning algorithm to take into consideration long-range correlations and global structural motifs within antisense oligomeric RNA and its binding partners. Previous methods almost always relied on local sequence information in these strands. Efficacy prediction was improved in this case as well.

Overall, the investigations resulted in new algorithms that improved both functional prediction and interaction prediction of protein and regulatory nucleic acid sequences. New directions for future research have also been generated as a result of this work.

Indexing (document details)
Advisor: Liao, Li
Commitee: Decker, Keith, Meyers, Blake, Shanker, Vijay
School: University of Delaware
Department: Department of Computer and Information Sciences
School Location: United States -- Delaware
Source: DAI-B 72/05, Dissertation Abstracts International
Subjects: Bioinformatics, Computer science
Keywords: Comparative genomics, Phylogenetic profiles, Ppi prediction, Protein function prediction, Protein-protein interaction prediction, Support vector machines
Publication Number: 3444293
ISBN: 978-1-124-51723-0
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy