We study an ensemble of urns with unknown compositions inferred from initial samples with replacement from each urn. This model fits diverse situations. For instance, in microbial ecology studies each urn represents an environment, each ball within an urn corresponds to an individual bacterium, and a ball's color represents its taxonomic label. In a different context, each urn could represent a random RNA pool and each colored ball a possible solution to a particular binding site problem over that pool.
The main parameter of this study is dissimilarity, which we define as the probability that a draw from one urn is not seen in a sample of size k from a possibly different urn. We estimate this parameter with a U-statistic, shown to be the uniformly minimum variance unbiased estimator (UMVUE) of dissimilarity over a range for k determined by initial sample sizes. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over k, we provide conditions that guarantee uniformly consistent estimates of variances via a jackknife method, and show uniform convergence in probability as well as approximately normal marginal distributions.
We apply our U-statistics and a restricted exponential regression to extrapolate dissimilarity over a range beyond that determined by initial sample sizes, which we use to identify an allocation of draws for subsequent sampling that minimizes a measure of pair-wise dissimilarities over the whole ensemble. This is motivated by the challenge faced by microbiome projects worldwide to effectively allocate additional samples for a more robust and reliable estimation of UniFrac distances between pairs of environments. Similar methods are applied to measures of sample quality of the ensemble derived from alpha-diversity and coverage. We test our methods against simulated data, where we compare optimal and inferred draw allocations when considering these three measures, and analyze 16S ribosomal RNA data from the Human Microbiome Project.
Some files may require a special program or browser plug-in. More Information
|Advisor:||Lladser, Manuel E.|
|Commitee:||Corcoran, Jem, Dukic, Vanja, Knight, Rob, Restrepo, Juan|
|School:||University of Colorado at Boulder|
|School Location:||United States -- Colorado|
|Source:||DAI-B 74/02(E), Dissertation Abstracts International|
|Subjects:||Ecology, Applied Mathematics, Statistics|
|Keywords:||Alpha-diversity, Coverage, Dissimilarity, Urn ensembles, Urn models|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be