Dissertation/Thesis Abstract

Development of Prognostic Systems for Cancer Patients
by Wang, Huan, Ph.D., The George Washington University, 2020, 102; 27832139
Abstract (Summary)

Traditionally the prognosis of cancer diseases depends on the AJCC staging system, which is based on three anatomic factors tumor size, lymph nodes, and distant metastasis. The Ensemble Algorithm for Clustering Cancer Data (EACCD) was invented to create cancer prognostic systems that can utilize prognostic information from more factors and generate prognostic groups similar to staging groups in the AJCC system. As a new statistical learning algorithm, EACCD has a great deal of room for further development. In particular, EACCD is in need of 1) more appropriate measures to quantify the difference between survival functions, 2) methods and criteria to evaluate and compare produced systems, and 3) strategies to optimize the number of prognostic groups. This dissertation is intended to improve EACCD from the above aspects.

To better quantify the difference between survival functions in EACCD, we develop effect sizes based on (weighted) logrank test statistics and the Mann-Whitney parameter. The proposed effect sizes can be summarized in the format of weighted differences in hazards. To assess and compare prognostic systems, we propose evaluation methods based on Kaplan-Meier curves and the C-index. To optimize the number of prognostic groups generated by EACCD, we introduce an optimal selection scheme based on the C-index curve. We apply the improved version of EACCD algorithm to data of thyroid cancer, colorectal cancer, ovarian cancer, lung cancer, and melanoma of the skin. For each cancer site, the choice of the effect size employed in EACCD is determined by the survival condition of the disease.

The results show that proposed effect sizes can help EACCD generate highly accurate prognostic systems for cancer of different sites. The survival curves of the generated prognostic groups are correctly ordered and well-separated. The C-index is successfully used to measure the performance of EACCD prognostic systems and suggest the optimal number of groups.

The proposed effect sizes defined as weighted differences in hazards can adequately measure the difference between survival functions and is therefore a perfect match for EACCD. The study of the C-index provides theoretical support for evaluating the performance of prognostic systems and optimizing the number of prognostic groups. The applications of the improved version of EACCD demonstrate its ability to adapt to a variety of diseases and broad prospects in the prognosis of diseases in the future.

Indexing (document details)
Advisor: Chen, Dechang, Pan, Qing
Commitee: Liang, Hua, Gastwirth, Joseph L., Kundu, Subrata, Wang, Lin, Lu, Jun
School: The George Washington University
Department: Biostatistics (CCAS/SPHHS)
School Location: United States -- District of Columbia
Source: DAI 81/11(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biostatistics
Keywords: Big data, C-index, Effect size, Machine learning, Mann-Whitney parameter, Survival analysis
Publication Number: 27832139
ISBN: 9798645447762
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest