In long term clinical trials, occurrence of missing data is an area of concern especially if the rate at which data are missing depends on the treatment group. Typically, some effort is spent on trying to identify the reasons the data are missing so that appropriate assumptions and analytic approaches can be properly applied. When data are missing by design, certain measurements are discontinued after meeting an endpoint, possibly due to ethical or financial constraints. Subjects who reach the absorbing barrier may stop data collection on some variables but may subsequent time-varying covariates available from continued follow-up. In this dissertation, we developed an Imputation-Estimation algorithm under an auxiliary missing at random assumption to assess whether the additional information from the time varying covariates can be used to improve estimation. Quality of estimates is evaluated in terms of bias, variance and coverage for the estimates of the parameters of interest. We contrast this method to other missing data approaches such as multiple imputation and available case analysis.
We illustrate this method using data from the Diabetes Prevention Program (DPP). The DPP was a diabetes prevention study that showed reductions of 58\% and 31\% in diabetes risk using intensive lifestyle or metformin interventions compared to placebo. According to the DPP protocol, the oral glucose tolerance test is discontinued after diabetes diagnosis. Because of the significant reduction in diabetes incidence by the metformin and lifestyle interventions, the rates of missing IGR and CIR are different among the treatment groups. This differential discontinuation among treatment groups results in informative monotone missing assessments of 30 minute glucose and insulin values. These 30 minute values are used to calculate surrogate measures of insulin secretion such as Insulin Glucose Ratio (IGR = (30-min insulin - fasting insulin)/(30-min glucose - fasting glucose)). Fasting blood glucose is collected at all time points and is associated with 30-minute glucose. The imputation estimation algorithm is applied to estimate the mean 30 minute blood glucose utilizing auxiliary information from the fasting blood glucose. In this example, fasting glucose is also the source of the discontinuation since diabetes diagnosis is based on the fasting glucose and 2 hour values during the OGTT. Because of the strong dependence between the fasting and 30 minute glucose measured at the same visit, the resulting estimates from the IE algorithm using the complete vector were similar to multiple imputation. Because the Placebo group experienced higher rates of diabetes incidence, the difference between available case analysis and the regression based imputations were greater than in the lifestyle group.
|Advisor:||Lachin, John M.|
|Commitee:||Bura, Efstathia, Cook, Nancy, Larsen, Micheal, Pan, Qing|
|School:||The George Washington University|
|School Location:||United States -- District of Columbia|
|Source:||DAI-B 74/01(E), Dissertation Abstracts International|
|Keywords:||Auxiliary information, Conditional means, Linear mixed models, Missing by design, Missing data, Multiple imputation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be