Dissertation/Thesis Abstract

RNA Secondary Structure Comparative Analysis: Method Development and Application to Genomics
by Fu, Yinghan, Ph.D., University of Rochester, 2016, 165; 10109865
Abstract (Summary)

RNA not only codes protein sequences, it also functions by having specific structures. Because RNA secondary structure is more stable than tertiary structure, it is feasible to study secondary structure independently. To improve secondary structure prediction accuracy, comparative analysis can be used. It assumes RNAs that conserve function usually evolve under structural constraints.

Previous computational methods that use comparative analysis did not accommodate domain insertions, where structural motifs are inserted in a sequence with respect to its homologs. For this work, domain insertion was introduced into the program Dynalign, which takes two sequences as input and outputs their conserved structures. This update, Dynalign II, significantly improves prediction accuracy upon Dynalign, especially over base pairs in inserted domains.

Computational comparative analysis methods for RNA structure prediction require parameters that quantify evolutionary constraints on RNA secondary structure and sequence alignment, e.g., parameters for base pairs and nucleotides deletion, insertion and mutation. A machine learning method called a log linear model was used to quantify these parameters using structural alignments of homologous RNAs as the training set. It was found that evolution favors structural conservation and disfavors structural mutation between homologous RNAs.

Comparative analysis also helps identify functional non-coding RNAs (ncRNA). Since homologous ncRNAs evolve under structural constraints, predicted structural conservation of homologous genomic sequences can be utilized to identify ncRNAs. A new program, Multifind, was developed. It takes multiple sequences as input and assesses the probability that they are homologous ncRNAs using a support vector machine (SVM). One input to SVM is based on the conservation of the input sequence structures predicted by Multilign, a multiple sequence conserved structure prediction program based on Dynalign. Multifind performs better than competing programs on testing sets constructed from a RNA database Rfam and detects unique ncRNAs on genomic data sets.

Indexing (document details)
Advisor: Mathews, David H.
Commitee: Grossfield, Alan, Sharma, Gaurav, Turner, Douglas H.
School: University of Rochester
Department: School of Medicine and Dentistry
School Location: United States -- New York
Source: DAI-B 77/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Molecular biology, Bioinformatics, Biophysics
Keywords: Comparative sequence analysis, Non-coding RNA, RNA, RNA secondary structure
Publication Number: 10109865
ISBN: 9781339731346
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest