Dissertation/Thesis Abstract

Using Supervised Machine Learning for Early Lung Cancer Detection
by Vo, Jessica, M.S., California State University, Long Beach, 2020, 76; 27736286
Abstract (Summary)

In 2018, the top three most common cancer types in the United States were breast cancer, lung cancer, and prostate cancer, in descending order. In 2019, approximately 13% of new cancer types are derived from lung cancer. Most late diagnosed cases are caused by hidden genetic variants and other subjective factors, such as smoking. In this study, we focus on applying supervised machine learning techniques (logistic regression, random forest, gradient boosting, extreme gradient boosting, support vector machine, and Bayesian additive regression trees) to the microarray gene expression data in order to detect those inherited factors which are most correlated to lung cancer development in the Caucasian smoking population. The model validation metrics are the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, recall, and precision percentages. The most effective model was found to be gradient boosting, which gives the highest prediction power (97.9%), with a recall of 90.9% and a precision of 90.9%.

Indexing (document details)
Advisor: Korosteleva, Olga
Commitee: Suaray, Kagba, Zhou, Tianni
School: California State University, Long Beach
Department: Mathematics and Statistics
School Location: United States -- California
Source: MAI 81/11(E), Masters Abstracts International
Subjects: Mathematics, Bioinformatics
Publication Number: 27736286
ISBN: 9798645443023
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy