Dissertation/Thesis Abstract

Some Statistical Methods on Design, Modeling and Analysis of High-Dimensional Data
by Shang, Shulian, Ph.D., New York University, 2012, 120; 3524267
Abstract (Summary)

In modern research, massive high-dimensional data are frequently generated by advancing technologies and combining multi-aspect data sources, and pose new challenges to statisticians. This thesis addresses various aspects, including study design, multiple hypotheses testing, nonlinear regression modeling and development of risk models related to high-dimensional data.

When a large number of hypotheses are tested simultaneously, controlling traditional type I error rate for each test at 5% will lead to an excessive number of false positives. The false discovery proportion (FDP) is a direct measure of the abundance of false positive findings, defined as the proportion of incorrect rejections among all rejections. We propose a sample size calculation method to control FDP and ensure overall power of a study. In addition, it is highly desired to have an accurate prediction interval for the FDP. We propose a formula-based and a permutation-based prediction interval, respectively, for weak and strong dependence between test statistics.

Developing flexible and parsimonious nonlinear models which can achieve dimension reduction is important for practical implementation and interpretation. Motivated by an ovarian cancer epidemiologic study, we consider the application and inference of a partially linear single index proportional hazard model, which includes a linear component and a nonparametric single index component. Polynomial spline approximation is used to estimate the nonlinear component, and asymptotic properties of the resulting estimators are established.

We also develop a relative risk prediction model for cancer recurrence in primary melanoma patients. We identify a microRNA signature from hundreds of candidate variables and build a risk prediction model for melanoma recurrence. The model is evaluated using an independent cohort.

In summary, we have conducted multiple projects to develop statistical methods to cope with problems in study design, as well as modeling and analysis of high-dimensional data.

Indexing (document details)
Advisor: Shao, Yongzhao, Liu, Mengling
Commitee: Fang, Yixin, Hayes, Richard B., Hernando-Monge, Eva M., Zhou, Baiyu
School: New York University
Department: Environmental Health Science
School Location: United States -- New York
Source: DAI-B 74/01(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biostatistics
Keywords: False discovery proportion, High-dimensional data, Multiple testing, Nonparametric regression, Risk prediction model, Single index model
Publication Number: 3524267
ISBN: 9781267585615