Dissertation/Thesis Abstract

Efficient algorithms for large data sets of genomic sequences in microbial community analysis
by Knox, David A., M.S., University of Colorado at Boulder, 2010, 71; 1477003
Abstract (Summary)

Microbial analysis of environmental samples uses high-throughput genomic sequencing to determine the diversity and quantity of microbial species. Current sequencing techniques can produce very large data sets that are not handled by current analysis applications, necessitating the design of better approaches. This work presents three new applications: SeqCluster, ParsInsert, and PTreeView. SeqCluster groups sequences based on similarity using a hierarchical clustering method and selects a representative sequence to create operational taxonomic units (OTUs). SeqCluster also supports large distance matrixes exceeding the size of available local memory by using a custom memory management system. ParsInsert introduces an algorithm that can exploit the knowledge provided by publicly available curated phylogenetic trees to efficiently produce both a phylogenetic tree and taxonomies for unknown sequences. PTreeView is a user-friendly visualization application with a broad range of functions and capabilities supporting very large trees. The applications presented here handle hundreds of thousands of sequences efficiently for data clustering, phylogenetic tree building, taxonomic classification, and tree visualization.

Indexing (document details)
Advisor: Dowell, Robin
Commitee: Goldberg, Debra, Sicker, Doug
School: University of Colorado at Boulder
Department: Computer Science
School Location: United States -- Colorado
Source: MAI 48/06M, Masters Abstracts International
Subjects: Bioinformatics, Computer science
Keywords: Clustering, Parsimony, Phylogenetic tree, Taxonomic classification, Visualization
Publication Number: 1477003
ISBN: 978-1-124-02265-9
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy