Dissertation/Thesis Abstract

Multi-Class Computational Evolution: Development, Benchmark Comparison, and Application to RNA-Seq Biomarker Discovery
by Crabtree, Nathaniel Mark, Ph.D., University of Arkansas at Little Rock, 2017, 106; 10620232
Abstract (Summary)

A computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.

The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.

The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.

Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.

The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.

Indexing (document details)
Advisor: Bowyer, John F.
Commitee: George, Nysia I., Jennings, Steven F., Kane, Cynthia JM, Moore, Jason H., Patterson, Tucker A.
School: University of Arkansas at Little Rock
Department: Information Science
School Location: United States -- Arkansas
Source: DAI-B 79/01(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biology, Biostatistics, Bioinformatics
Keywords: Artificial intelligence, Evolutionary algorithm, Machine learning, Multi-class, Rna sequencing, Toxicology
Publication Number: 10620232
ISBN: 9780355197846
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest