Genome structure is the order and orientation of pieces of DNA comprising a genome, which contains the information of life. With advances in DNA sequencing technology and now massive availability of sequence data, the study of genome structure cannot be easily carried out without efficient and expressly designed algorithms. In this dissertation, we study three genome structure-related problems: structural error correction of draft genome assemblies, inversion prediction, and predicting operons. Our work with draft genome assemblies explores a novel Maximum Alternating Path Cover (MAPC) model to improve genome correctness and downstream analysis. Our work on inversion prediction aims to predict and catalog inversions by exploring the well-known Range Maximum Query model and Max-Cut model for what we call “global” inversions, and the novel Rectangle Clustering model and Representative Rectangle Prediction model for more localized inversions. For operon prediction, we again apply the MAPC model (with improved algorithms and theoretical analysis), coupled with a novel Intro-Column Exclusive Clustering model, to predict and catalog operons in closely related species. Evaluated using both simulated and real genome data, our algorithms and implementations have shown substantial promise for accurate computational analysis of genome structure in significantly shorter time.
|Advisor:||Chen, Danny Z., Emrich, Scott J.|
|School:||University of Notre Dame|
|School Location:||United States -- Indiana|
|Source:||DAI-A 82/3(E), Dissertation Abstracts International|
|Subjects:||Computer science, Information science, Genetics|
|Keywords:||DNA sequencing, Maximum Alternating Path Cover, Operon prediction|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be