An observational study is an empirical investigation of treatment effect when randomized experimentation is not ethical or feasible (Rosenbaum 2009). Observational studies are common in real life due to the following reasons: a) randomization is not feasible due to the ethical or financial reason; b) data are collected from survey or other resources where the object and design of the study has not been determined (e.g. retrospective study using administrative records); c) little knowledge on the given region so that some preliminary studies of observational data are conducted to formulate hypotheses to be tested in subsequent experiments. When statistical analysis are done using observational studies, the following issues need to be considered: a) the lack of randomization may lead to a selection bias; b) representativeness of sampling with respect to the problem under consideration (e.g. study of factors influencing a rare disease using a nationally representative survey with respective to race, income, and gender but not with respect to the rare disease condition).We will use the following sample to illustrate the challenges of observational studies and possible mitigation measures.
Our example is based on the study by Lalonde (1986), which evaluated the impact of job training on the earnings improvement of low-skilled workers in 1970's (In Paper 1 section 1.5.2, we will discuss this data set in more detail). The treatment effect estimated from the observational study was quite different from the one obtained using the baseline randomized "National Supported Work (NSW) Experiment" carried out in the mid-1970's. Now we understand the treatment effect which is the impact of job training. Selection bias may contaminate the treatment effect, in other words, workers who receive the job training may be fundamentally different from those who do not. Furthermore, the sample of control group selected for observational study by Lalonde may not represent the sample of control group from the original NSW experiment.
In this study, we address the issue of lack of randomization by applying a new matching algorithm (Outlier First Matching, OFM) which can be used in conjunction with the Propensity Score Analysis (PSA) or other similar methods to achieve the convincible treatment effect estimation in observational studies.
This dissertation consists of three papers.
Paper 1 proposes a new "Stepwise Matching Framework (SMF)" and rationalizes its usage in causal inference study (especially for PSA study using observational data). Furthermore, under the new framework of SMF, one new matching algorithm (Outlier First Matching or OFM in short) will be introduced. Its performance along with other well-known matching algorithms will be studied using the cross sectional data.
Paper 2 extends methods of paper 1 to correlated data (especially to longitudinal data). In the circumstance of correlated data (e.g. longitudinal data), besides the selection bias as in cross-sectional observational data, the repeated measures bring out the between-subject and within-subject correlation. Furthermore, the repeated measures can also bring out the missing value problem and rolling enrollment problem. All of above challenges from correlated data complexity the data structure and need to be addressed using more complex model and methodology. Our methodology calculate the variant p-score of control subjects at each time point and generate the p-score difference from each control subject to every treatment subject at treatment subject's time point. Then such p-score differences are summarized to create the distance matrix for next step analysis. Once again, the performance of OFM and other well-established matching algorithms are compared side by side and the conclusion will be summarized through simulation and real data applications.
Paper 3 handles missing value problem in longitudinal data. As we have mentioned in paper 2, the complexity of data structure of longitudinal data often comes with the problem of missing data. Due to the possibility of between subject and within subject correlation, the traditional imputation methodology will probably ignore the above two correlations so that it may lead to biased or inefficient imputation of missing data. We adopt one missing value imputation strategy introduced by Schafer and Yucel (2002) through one R package "pan" to handle the above two correlations. The "imputed complete data" will be treated using the similar methodology as paper 2. Then MI results will be summarized using Rubin's rule (1987). The conclusion will be drawn based on the findings through simulation study and compared to what we have found in complete longitudinal data study in paper 2.
In last section, we conclude the dissertation with the discussion of preliminary results, as well as the strengths and limitations of the present research. Also we will point out the direction of the future study and provide suggestions to practice works.
|Advisor:||Yucel, Recai M., Pruzek, Robert M.|
|Commitee:||DiRienzo, Gregory A., Lu, Tao|
|School:||State University of New York at Albany|
|Department:||Biometry and Statistics|
|School Location:||United States -- New York|
|Source:||DAI-B 76/01(E), Dissertation Abstracts International|
|Keywords:||Causal inference, Generalized estimating equation, Longitudinal analysis, Matching, Missing data analysis, Propensity score analysis|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be