Traditionally the prognosis of cancer diseases depends on the AJCC staging system, which is based on three anatomic factors tumor size, lymph nodes, and distant metastasis. The Ensemble Algorithm for Clustering Cancer Data (EACCD) was invented to create cancer prognostic systems that can utilize prognostic information from more factors and generate prognostic groups similar to staging groups in the AJCC system. As a new statistical learning algorithm, EACCD has a great deal of room for further development. In particular, EACCD is in need of 1) more appropriate measures to quantify the difference between survival functions, 2) methods and criteria to evaluate and compare produced systems, and 3) strategies to optimize the number of prognostic groups. This dissertation is intended to improve EACCD from the above aspects.
To better quantify the difference between survival functions in EACCD, we develop effect sizes based on (weighted) logrank test statistics and the Mann-Whitney parameter. The proposed effect sizes can be summarized in the format of weighted differences in hazards. To assess and compare prognostic systems, we propose evaluation methods based on Kaplan-Meier curves and the C-index. To optimize the number of prognostic groups generated by EACCD, we introduce an optimal selection scheme based on the C-index curve. We apply the improved version of EACCD algorithm to data of thyroid cancer, colorectal cancer, ovarian cancer, lung cancer, and melanoma of the skin. For each cancer site, the choice of the effect size employed in EACCD is determined by the survival condition of the disease.
The results show that proposed effect sizes can help EACCD generate highly accurate prognostic systems for cancer of different sites. The survival curves of the generated prognostic groups are correctly ordered and well-separated. The C-index is successfully used to measure the performance of EACCD prognostic systems and suggest the optimal number of groups.
The proposed effect sizes defined as weighted differences in hazards can adequately measure the difference between survival functions and is therefore a perfect match for EACCD. The study of the C-index provides theoretical support for evaluating the performance of prognostic systems and optimizing the number of prognostic groups. The applications of the improved version of EACCD demonstrate its ability to adapt to a variety of diseases and broad prospects in the prognosis of diseases in the future.
|Advisor:||Chen, Dechang, Pan, Qing|
|Commitee:||Liang, Hua, Gastwirth, Joseph L., Kundu, Subrata, Wang, Lin, Lu, Jun|
|School:||The George Washington University|
|School Location:||United States -- District of Columbia|
|Source:||DAI 81/11(E), Dissertation Abstracts International|
|Keywords:||Big data, C-index, Effect size, Machine learning, Mann-Whitney parameter, Survival analysis|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be