Since the function of a molecular sequence depends on its structure, analyzing RNA structures is essential to create new drugs and understand genetic diseases. Pseudoknots are one type of RNA structures that have attracted a lot of interest in recent years, especially as it became possible to address the computational complexity associated with modeling this type of structures. Pseudoknot structures have functional importance since they appear, for example, in viral genome RNAs and ribozyme active sites. In predicting RNA structures, computational methods are less expensive than other methods such as nuclear magnetic resonance and x-ray crystallography. A relatively new approach to structure analysis, namely, the grammatical approach has attracted the attention of many researchers, because it can model long range interactions. Grammars offer a natural and concise way to model DNA, RNA, and protein sequences. In this research, we aim to facilitate for biologists the use of grammatical models for RNA structure analysis through the automation of the grammar building step. We focus on grammatical models capable of representing pseudoknots.
The main contribution of this research is the development of an RNA structure analysis framework, TAGRNAInf. The framework is capable of analyzing RNA structures including pseudoknots. It currently addresses two RNA structure analysis problems: structure identification and RNA folding, and it can be expanded to address other problems like structural classification and motif search. The approach adapted in this solution is a grammatical inference approach that has a learning algorithm for a grammatical model capable of representing RNA pseudoknots (Tree Adjoining Grammars for RNA, TAG RNA) at the core of its learning phase. There has been previous research on the use of grammatical approach for RNA structure analysis including pseudoknots in which a specific model is built for a certain family of RNAs. However, there has been limited research on the use of grammatical inference for RNA structure analysis. TAGRNAInf. is the first complete framework for RNA structure analysis including pseudoknot, based on grammatical inference, that has been experimentally tested and yielded results competitive to other available methods.
As a part of this research, we also developed a new grammatical model, Linked Single Adjoining-Tree Adjoining Grammars (LSA–TAG), capable of representing pseudoknots. We have developed a grammatical inference algorithm for LSA–TAG that can learn the grammar for a family of RNA structures from example sequences. This inference algorithm has proven to be mainly of theoretical interest.
|Advisor:||Ammar, Reda A., Rajasekaran, Sanguthevar|
|School:||University of Connecticut|
|School Location:||United States -- Connecticut|
|Source:||DAI-B 71/08, Dissertation Abstracts International|
|Subjects:||Bioinformatics, Computer science|
|Keywords:||Machine learning, Pseudoknots, RNA structure|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be