Nonrandomly mating populations, referred to as structured populations, are commonly encountered in genetic studies. A common characteristic of structured populations is that separate subpopulations differ systematically in their genetic attributes. In a global sample of unrelated individuals, for example, allele frequencies typically differ between geographically-defined subpopulations. Two analytical goals when studying datasets exhibiting population structure are: (i) characterizing population structure and (ii) identifying causal gene-trait relationships in its presence. This work is comprised of two complementary projects, corresponding to each of these goals.
In the first project, we introduce a computationally efficient algorithm for fitting the admixture model of population structure. The central strategy of our algorithm, which we call ALStructure, is to first estimate the latent linear subspace of admixture components and then search for models within this subspace that satisfy the probabilistic constraints of the admixture model. We find that ALStructure typically outperforms preexisting methods both in accuracy and speed under a wide array of simulated and real datasets.
In the second project, we show how the random process of meiosis can be leveraged as a form of experimental randomization capable of uncovering causal relationships between genes and traits in the presence of population structure. We introduce novel tests based on parent-child trio data developed within the causal framework of potential outcomes. Additionally, we evaluate the causal properties of the popular transmission-disequilibrium test (TDT). We describe and assess the feasibility of assumptions under which each of these procedures are tests of a causal property, which we define as causal linkage. To enable this project, we first provide a detailed discussion of the connection between causality and measure theoretic probability by constructing causal models on probability spaces.
|Advisor:||Storey, John D|
|Commitee:||Fan, Jianqing, Engelhardt, Barbara, Akey, Joshua|
|Department:||Applied and Computational Mathematics|
|School Location:||United States -- New Jersey|
|Source:||DAI-B 81/8(E), Dissertation Abstracts International|
|Subjects:||Statistics, Genetics, Biostatistics|
|Keywords:||Causal inference, Latent variables, Population genetics, Population structure, Statistical genetics, Statistics|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be