Dissertation/Thesis Abstract

Improving Multiple Sequence Alignments by Revising Sequence Families with Alignment Scoring Approaches
by Levchuk, Aleksandr Olegovich, M.S., University of California, Riverside, 2011, 72; 1505602
Abstract (Summary)

Characterizing the functional, structural, and evolutionary relationships of biological sequences is an important task in modern genomics and computational biology. Most of these applications involve the assembly of sequence families by similarity searching, subsequent formation of multiple sequence alignments (MSAs) and downstream phylogenetic analyses. Especially, MSAs play a central role in this modeling workflow. Thus, the quality of the MSAs is of critical importance for its success. In this study I present an approach to improve the quality of MSAs by using a sequence family revision approach that can automatically remove false positive candidates from sequence families and then recompute an improved MSA. The approach is able to combine sequence-level scores from two MSA scoring methods, norMD and GUIDANCE. It automatically selects an optimized score threshold for removing sequences from MSAs. To test the performance of this method, I developed several automated procedures to add to curated MSAs from the CDD database controlled numbers of randomly selected nonmember sequences. Then I performed Receiver Operating Characteristic (ROC) analysis on the classification results incorporating automatic threshold selection approaches. Surprisingly, the sequence-level scores, provided by the two MSA scoring methods, were less successful than a simple all-against-all BLAST-based pairwise alignment scoring approach. However, I was able to improve one of the MSA scoring methods by extending it with a dynamic threshold selection approach. The extended method outperformed the performance of the BLAST-based method in detecting false positives in sequence families.

Indexing (document details)
Advisor: Girke, Thomas
Commitee: Keogh, Eamonn, Lonardi, Stefano
School: University of California, Riverside
Department: Computer Science
School Location: United States -- California
Source: MAI 50/04M, Masters Abstracts International
Source Type: DISSERTATION
Subjects: Bioinformatics, Computer science
Keywords: Free and open source software, Multiple sequence alignments, Protein primary structure, Sequence families, Threshold selection
Publication Number: 1505602
ISBN: 9781267132505
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest