RNA not only codes protein sequences, it also functions by having specific structures. Because RNA secondary structure is more stable than tertiary structure, it is feasible to study secondary structure independently. To improve secondary structure prediction accuracy, comparative analysis can be used. It assumes RNAs that conserve function usually evolve under structural constraints.
Previous computational methods that use comparative analysis did not accommodate domain insertions, where structural motifs are inserted in a sequence with respect to its homologs. For this work, domain insertion was introduced into the program Dynalign, which takes two sequences as input and outputs their conserved structures. This update, Dynalign II, significantly improves prediction accuracy upon Dynalign, especially over base pairs in inserted domains.
Computational comparative analysis methods for RNA structure prediction require parameters that quantify evolutionary constraints on RNA secondary structure and sequence alignment, e.g., parameters for base pairs and nucleotides deletion, insertion and mutation. A machine learning method called a log linear model was used to quantify these parameters using structural alignments of homologous RNAs as the training set. It was found that evolution favors structural conservation and disfavors structural mutation between homologous RNAs.
Comparative analysis also helps identify functional non-coding RNAs (ncRNA). Since homologous ncRNAs evolve under structural constraints, predicted structural conservation of homologous genomic sequences can be utilized to identify ncRNAs. A new program, Multifind, was developed. It takes multiple sequences as input and assesses the probability that they are homologous ncRNAs using a support vector machine (SVM). One input to SVM is based on the conservation of the input sequence structures predicted by Multilign, a multiple sequence conserved structure prediction program based on Dynalign. Multifind performs better than competing programs on testing sets constructed from a RNA database Rfam and detects unique ncRNAs on genomic data sets.
|Advisor:||Mathews, David H.|
|Commitee:||Grossfield, Alan, Sharma, Gaurav, Turner, Douglas H.|
|School:||University of Rochester|
|Department:||School of Medicine and Dentistry|
|School Location:||United States -- New York|
|Source:||DAI-B 77/10(E), Dissertation Abstracts International|
|Subjects:||Molecular biology, Bioinformatics, Biophysics|
|Keywords:||Comparative sequence analysis, Non-coding RNA, RNA, RNA secondary structure|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be