Dissertation/Thesis Abstract

Exploration of Selection Bias, the Heckman Correction and Missing Data Imputation
by Machiorlatti, Michael George, Ph.D., The University of Oklahoma Health Sciences Center, 2019, 231; 27547609
Abstract (Summary)

Heckman: The Heckman selection model proposed by Heckman (1979) has been used extensively in economics and social sciences to correct selection bias. It is closely related to nonignorable nonresponse problems. Inference based on the Heckman selection model with complex survey data has not been rigorously studied. Results from the Heckman model, which are attained by assuming simple random sampling, can be biased and misleading when used in a complex survey. We studied properties of estimates from Heckman model by incorporating sampling design features. In addition, we propose efficient weight smoothing approaches to further improve efficiency of the estimates under informative sampling. Simulation studies show the benefits of our proposed methods. Our proposed estimators have smaller bias and mean square error than the design-based estimator. An empirical study using (2015-16) NHANES was then explored utilizing our techniques in order to understand their implications using a real data application.

Doubly Robust Imputation: Missing data is an issue across all fields that involve data analysis. Methods for missing data analysis has been explored in broad settings in order to understand how the missing data influences the efficiency and bias of certain estimators and techniques. In this research, we explore parametric and non-parametric imputation procedures in order to understand their effectiveness. We then employ a doubly robust algorithm comprised of model and propensity score information using a purely nonparametric approach utilizing regression trees to explore the benefits of this method over utilizing one singular imputation approach. Specifically, we explore how regression imputation, multiple imputation, predictive mean matching, hot deck imputation, nearest neighbor and classification trees perform in a simulated missing scenario. Our study departs from traditional imputation papers in that we simulate missingness from NHANES 2015–16 survey in order to evaluate the procedures in a real data scenario with complex survey design.

Indexing (document details)
Advisor: Vesely, Sara, Chen, Sixia
Commitee: Ding, Kai, Campbell, Janis, George, James
School: The University of Oklahoma Health Sciences Center
Department: Biostatistics
School Location: United States -- Oklahoma
Source: DAI-B 81/7(E), Dissertation Abstracts International
Subjects: Biostatistics
Keywords: Complex survey, Heckman model, Missing data, Nonparametric, Random forest, Weight smoothing
Publication Number: 27547609
ISBN: 9781392817360
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy