Dissertation/Thesis Abstract

StochHMM: A Flexible Hidden Markov Model Framework
by Lott, Paul Christian, Ph.D., University of California, Davis, 2013, 186; 3602142
Abstract (Summary)

In the era of genomics, data analysis models and algorithms that provide the means to reduce large complex sets into meaningful information are integral to further our understanding of complex biological systems. Hidden Markov models comprise one such data analysis technique that has become the basis of many bioinformatics tools. Its relative success is primarily due to its conceptually simplicity and robust statistical foundation. Despite being one of the most popular data analysis modeling techniques for classification of linear sequences of data, researchers have few available software options to rapidly implement the necessary modeling framework and algorithms. Most tools are still hand-coded because current implementation solutions do not provide the required ease or flexibility that allows researchers to implement models in non-traditional ways. I have developed a free hidden Markov model C++ library and application, called StochHMM, that provides researchers with the flexibility to apply hidden Markov models to unique sequence analysis problems. It provides researchers the ability to rapidly implement a model using a simple text file and at the same time provide the flexibility to adapt the model in non-traditional ways. In addition, it provides many features that are not available in any current HMM implementation tools, such as stochastic sampling algorithms, ability to link user-defined functions into the HMM framework, and multiple ways to integrate additional data sources together to make better predictions. Using StochHMM, we have been able to rapidly implement models for R-loop prediction and classification of methylation domains. The R-loop predictions uncovered the epigenetic regulatory role of R-loops at CpG promoters and protein coding genes 3' transcription termination. Classification of methylation domains in multiple pluripotent tissues identified epigenetics gene tracks that will help inform our understanding of epigenetic diseases.

Supplemental Files

Some files may require a special program or browser plug-in. More Information

Indexing (document details)
Advisor: Korf, Ian F.
Commitee: Chedin, Frederic L., LaSalle, Janine M., Segal, David J.
School: University of California, Davis
Department: Genetics
School Location: United States -- California
Source: DAI-B 75/03(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biostatistics, Bioinformatics
Keywords: Hidden markov model, Methylation domains, R-loops
Publication Number: 3602142
ISBN: 9781303539503