Functional RNA elements do not encode proteins, but rather function directly as RNAs. Many different types of RNAs play important roles in a wide range of cellular processes, including protein synthesis, gene regulation, protein transport, splicing, and more. Because important sequence and structural features tend to be evolutionarily conserved, one way to learn about functional RNAs is through comparative sequence analysis - by collecting and aligning examples of homologous RNAs and comparing them.
Covariance models (CMs) are powerful computational tools for homology search and alignment that score both the conserved sequence and secondary structure of an RNA family. However, due to the high computational complexity of their search and alignment algorithms, searches against large databases and alignment of large RNAs like small subunit ribosomal RNA (SSU rRNA) are prohibitively slow. Large-scale alignment of SSU rRNA is of particular utility for environmental survey studies of microbial diversity which often use the rRNA as a phylogenetic marker of microorganisms.
In this work, we improve CM methods by making them faster and more sensitive to remote homology. To accelerate searches, we introduce a query-dependent banding (QDB) technique that makes scoring sequences more efficient by restricting the possible lengths of structural elements based on their probability given the model. We combine QDB with a complementary filtering method that quickly prunes away database subsequences deemed unlikely to receive high CM scores based on sequence conservation alone. To increase search sensitivity, we apply two model parameterization strategies from protein homology search tools to CMs. As judged by our benchmark, these combined approaches yield about a 250-fold speedup and significant increase in search sensitivity compared with previous implementations. To accelerate alignment, we apply a method that uses a fast sequence-based alignment of a target sequence to determine constraints for the more expensive CM sequence- and structure-based alignment. This technique reduces the time required to align one SSU rRNA sequence from about 15 minutes to 1 second with a negligible effect on alignment accuracy. Collectively, these improvements make CMs more powerful and practical tools for RNA homology search and alignment.
|Advisor:||Eddy, Sean R.|
|Commitee:||Brent, Michael, Buhler, Jeremy, Fay, Justin, Gordon, Jeff, Mitra, Rob, Stormo, Gary|
|School:||Washington University in St. Louis|
|Department:||Biology & Biomedical Sciences (Computational Biology)|
|School Location:||United States -- Missouri|
|Source:||DAI-B 70/09, Dissertation Abstracts International|
|Subjects:||Molecular biology, Bioinformatics, Computer science|
|Keywords:||Alignment, Covariance models, Homology search, Non-coding RNA|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
supplemental files is subject to the ProQuest Terms and Conditions of use.