COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Understanding Mechanisms of Translation and Transcription
by Yurovsky, Alisa, Ph.D., State University of New York at Stony Brook, 2020, 196; 28262145
Abstract (Summary)

Genomics has recently entered the realm of Big Data, and the last decade has seen an explosion in genome sequencing and assembly. The age of Big Data has also become synonymous with deep learning, and various deep network architectures have been developed to tackle genome annotation problems. At the same time, new exciting techniques have emerged, which allow the sequencing of only the portions of the RNA being actively translated by the ribosomes (ribosome profiling), and sequencing the RNA from individual cells (scRNA-seq).

This thesis takes advantage of recent advances in genomics, describing new methods and algorithms to improve the understanding of translation and genetic encoding biases, as well algorithms to improve the annotation on genome and single cell levels. Our algorithm to determine the rates of translation of codons using ribosome profiling data from yeast generated the first measurement of the differential rate of translation of all 61 codons in vivo. We developed several analytic approaches to demonstrate that prokaryotic coding regions have little specific depletion of Shine-Dalgarno motifs. We used highly conserved regions of the 16S rRNAs to develop an algorithm to fix erroneous 16S rRNA 3' end annotations in over twelve thousand prokaryotic organisms in the NCBI Genebank. In our foray into gene annotation, we evaluated various DNA K-mer embeddings, and developed DeepAnnotator, a deep learning architecture for genome annotation which achieved an F-score of 94%. We then turned to automatic annotation of cell phase in scRNA-seq data, describing Pre-Phaser, which established a general computational approach for precise cell phase assignment using k nearest neighbors. Finally, to pursue the goal of novel transcript and protein detection, we developed a statistical framework to identify all likely frameshift positions in a genome, as well as a frameshift simulator for the ribosome profiling data to verify our algorithm.

Indexing (document details)
Advisor: Skiena, Steven
Commitee: Balasubramanian, Niranjan, Patro, Rob, Futcher, Bruce
School: State University of New York at Stony Brook
Department: Computer Science
School Location: United States -- New York
Source: DAI-B 82/7(E), Dissertation Abstracts International
Subjects: Computer science, Systematic biology
Keywords: Bioinformatics, Computational biology, Computer science, Data science, Genomics, Statistics
Publication Number: 28262145
ISBN: 9798557094603
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy