In modern research, massive high-dimensional data are frequently generated by advancing technologies and combining multi-aspect data sources, and pose new challenges to statisticians. This thesis addresses various aspects, including study design, multiple hypotheses testing, nonlinear regression modeling and development of risk models related to high-dimensional data.
When a large number of hypotheses are tested simultaneously, controlling traditional type I error rate for each test at 5% will lead to an excessive number of false positives. The false discovery proportion (FDP) is a direct measure of the abundance of false positive findings, defined as the proportion of incorrect rejections among all rejections. We propose a sample size calculation method to control FDP and ensure overall power of a study. In addition, it is highly desired to have an accurate prediction interval for the FDP. We propose a formula-based and a permutation-based prediction interval, respectively, for weak and strong dependence between test statistics.
Developing flexible and parsimonious nonlinear models which can achieve dimension reduction is important for practical implementation and interpretation. Motivated by an ovarian cancer epidemiologic study, we consider the application and inference of a partially linear single index proportional hazard model, which includes a linear component and a nonparametric single index component. Polynomial spline approximation is used to estimate the nonlinear component, and asymptotic properties of the resulting estimators are established.
We also develop a relative risk prediction model for cancer recurrence in primary melanoma patients. We identify a microRNA signature from hundreds of candidate variables and build a risk prediction model for melanoma recurrence. The model is evaluated using an independent cohort.
In summary, we have conducted multiple projects to develop statistical methods to cope with problems in study design, as well as modeling and analysis of high-dimensional data.
|Advisor:||Shao, Yongzhao, Liu, Mengling|
|Commitee:||Fang, Yixin, Hayes, Richard B., Hernando-Monge, Eva M., Zhou, Baiyu|
|School:||New York University|
|Department:||Environmental Health Science|
|School Location:||United States -- New York|
|Source:||DAI-B 74/01(E), Dissertation Abstracts International|
|Keywords:||False discovery proportion, High-dimensional data, Multiple testing, Nonparametric regression, Risk prediction model, Single index model|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be