Dissertation/Thesis Abstract

Statistical Methods for Analysis of Genetic and Survival Data with Latent Heterogeneity
by Han, Xiaoxia, Ph.D., New York University, 2017, 172; 10286970
Abstract (Summary)

In part one, we address the issue of effective identification of genetic variants that underlie common human disorders. Common and complex human diseases usually have heterogeneous disease etiology, which involves interplay of multiple genetic and environmental factors, leading to latent population substructure. In addition, these complex diseases typically have incomplete and heterogeneous penetrance. Qian and Shao recently developed a novel likelihood ratio test under genetic heterogeneity (LRT-H) to identify common genetic variants underlying complex human disorders. The method is computationally efficient and aimed at genome-wide association studies (GWAS) where latent genetic heterogeneity widely exists. The original paper only presents some power simulation results to show that the new method has competitive power to the existing GWAS tests. However, no direct evidence was presented in the paper to prove that the new promising test can indeed identify many additional important genetic variants that cannot be identified by the commonly used GWAS tests. In the first part, we apply the novel LRT-H proposed by Qian and Shao to identify common variants associated with late-onset Alzheimer's disease using the publicly available Alzheimer's Disease Neuroimaging Initiative (ADNI) GWAS dataset. We also compare LRT-H with the commonly used GWAS association tests, such as Cochran-Armitage trend test and case-only Hardy-Weinberg disequilibrium test. Further, motivated by the LRT-H, we propose a likelihood ratio test under transmission heterogeneity for linkage analysis using family data with two affected siblings. We derive a closed-form formula for the LRT test statistic and provide explicit asymptotic null distribution. We conduct extensive simulation studies for type I error and power analysis. In addition, we develop an R package gLRTH to implement the proposed method for genome-wide linkage analysis as well as Qian and Shao's LRT-H for GWAS.

In the second part, we address statistical issues on measuring and comparing two correlated C-indices for censored survival data. As new biomarkers and diagnostic/prognostic procedures are in rapid development, methods to evaluate and compare performance of two biomarkers or two risk scoring algorithms are of great current interest in precision medicine. However, measuring and comparing predictive accuracy in survival models with unknown censoring distributions have been challenging statistically. In particular, Kang et al. proposed a one-shot nonparametric approach to compare predictive accuracy of two correlated survival models based on the difference of two Harrell's C-statistics. They also developed a publicly available R package compareC to implement the proposed method. However, the validity of this approach in general has not been investigated systematically. In part two, we provide two counterexamples, a change point Cox PH model and a threshold model, both arising from survival models with heterogeneity, to show that this approach is biased and cannot achieve Type I error as demonstrated via simulation studies. We further propose a necessary and sufficient condition under which the one-shot nonparametric approach is asymptotically valid. As Cox proportional hazards (PH) models are the most widely used model for survival analysis, it is of interest to investigate methods that work well for Cox PH models. We numerically compare the performance of three commonly used concordance measures, i.e., Harrell's C-statistic, inverse-probability-of-censoring weighted (IPCW) C-statistic and concordance probability estimate (also known as K-index) proposed by Gonen and Heller within a Cox PH model framework. We suggest using K-index when the proportional hazards assumption is satisfied based on simulation studies and theoretical considerations. Finally, we provide a comprehensive case study, i.e., we develop an Alzheimer's disease risk prediction model among mild cognitive impairment subjects using Cox PH model with ADNI dataset and evaluate the model performance using K-index. (Abstract shortened by ProQuest.)

Indexing (document details)
Advisor: Shao, Yongzhao, Goldberg, Judith D.
Commitee: Fang, Yixin, Liu, Mengling, Troxel, Andrea B.
School: New York University
Department: Environmental Health Science
School Location: United States -- New York
Source: DAI-B 79/02(E), Dissertation Abstracts International
Subjects: Biostatistics, Statistics
Keywords: Alzheimer's disease, Concordance probability, Cure models, Genome-wide studies, Latent class analysis, Survival analysis
Publication Number: 10286970
ISBN: 978-0-355-40701-3
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy