For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and pre-existing (prevalent) disease is left-censored. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. I consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. I demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. I propose a general family of mixture models that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. I compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier methods provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. My findings support use of logistic-Weibull models over Kaplan-Meier methods to develop risk estimates for informing U.S. risk-based cervical cancer screening guidelines. Harrell's c index is widely used to measure the accuracy in predicting univariate survival outcomes. However, survival outcomes relating to a disease of interest may show up in multiple endpoints of interest. I propose two extensions of Harrell's c index for composite survival outcomes that account for frequencies of occurrences and the severity/importance of the outcomes. A weighted C index is proposed for a disease process with multiple equally important endpoints, and a most severe comparable C index is proposed for a disease process with a rare primary outcome and a correlated secondary outcome. Asymptotic properties are derived based on theorems for U-statistics. In the simulation studies, my extensions gain efficiency and power in identifying true prognostic variables. I illustrate these novel concordance indices using the Epidemiology of Diabetes Intervention and Complications (EDIC) and the Diabetes Prevention Program (DPP) trials. In EDIC, the prognosis of diabetes patients at risk for multiple equally important microvascular complications are evaluated using the weighted C index. In DPP, patients with impaired glucose resistance (IGR) who may either progress to type II diabetes or regress to normal glucose resistance (NGR). The proposed most severe comparable index better evaluates the accuracy in predicting diabetes risk with the help of auxiliary NGR outcomes.
|Advisor:||Pan, Qing, Katki, Hormuzd A.|
|Commitee:||Gastwirth, Joseph L., Wang, Huixia J.|
|School:||The George Washington University|
|School Location:||United States -- District of Columbia|
|Source:||DAI-B 78/08(E), Dissertation Abstracts International|
|Keywords:||Electronic health records, Kaplan-Meier methods, Measures of discrimination, Multivariate survival analysis|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be