Dissertation/Thesis Abstract

Roughened Random Forests for Binary Classification
by Xiong, Kuangnan, Ph.D., State University of New York at Albany, 2014, 136; 3624962
Abstract (Summary)

Binary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are missing completely at random and subsequent missing data imputation.

Through this dissertation we aim to answer a few important questions in building roughened random forests: (1) What is the ideal rate of missing data to impose on the original dataset? (2) Should we impose missing data on both the training and testing datasets, or only on the training dataset? (3) What are the best missing data imputation methods to use in roughened random forests? (4) Do roughened random forests share the same ideal number of covariates selected at each tree node as the original random forests? (5) Can roughened random forests be used in medium- to high- dimensional datasets?

Indexing (document details)
Advisor: Yucel, Recai M.
Commitee: DiRienzo, A. Gregory, Lu, Tao
School: State University of New York at Albany
Department: Biometry and Statistics
School Location: United States -- New York
Source: DAI-B 75/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biostatistics, Statistics, Computer science
Keywords: Binary classification, High-dimensional microarray data, Multiple imputation, Random forests, Single imputation, Variable selection
Publication Number: 3624962
ISBN: 9781303992742