Dissertation/Thesis Abstract

Poisson multiscale methods for high-throughput sequencing data
by Xing, Zhengrong, Ph.D., The University of Chicago, 2016, 229; 10195268
Abstract (Summary)

In this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions.

We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios.

Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks.

We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings.

Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.

Indexing (document details)
Advisor: Stephens, Matthew
Commitee: Lafferty, John, Nicolae, Dan
School: The University of Chicago
Department: Statistics
School Location: United States -- Illinois
Source: DAI-B 78/05(E), Dissertation Abstracts International
Subjects: Biostatistics, Genetics, Statistics
Keywords: Clustering, Multiscale, Poisson, Sequencing
Publication Number: 10195268
ISBN: 9781369438291