Dissertation/Thesis Abstract

Supervised Learning and Outlier Detection for High-dimensional Data Using Principal Components
by Ding, Lei, Ph.D., Indiana University, 2020, 127; 27744358
Abstract (Summary)

High-dimensional problems such as regression, classification, and outlier detection play important roles in fields such as genomics and neuroscience. Methods based on principal components have been widely used for solving such problems because principal components construct a linear manifold in lower dimensions which accounts for unknown grouping structures of the features while preserving a large amount of information from the raw data. In linear prediction problems, feature selection has always been a key desire in these fields in addition to accurate prediction and fast computation. Existing sparse methods utilizing principal components have been stunningly successful, but they can fail to select the correct predictors, predict poorly relative to non-sparse alternatives, or have poor computational tractability. In outlier detection problems, current methods based on principal components depend highly on the raw data, neglecting the underlying structure of outliers, and are thus underpowered. In this dissertation, I develop two methods, AIMER and SuffPCR, to handle high-dimensional prediction tasks including regression and classification, especially in the context of sparse linear manifolds. While AIMER employs a linear algebraic technique for the approximation of principal components, SuffPCR is an extension of sparse PCA. I also create a new method, PCATF, for solving high-dimensional outlier detection problems utilizing the estimation of principal components with constant trend filtering. These new methods generally work well for a variety of simulated, semi-simulated, and real genomics and neuroimaging data and are accompanied by some theoretical guarantees.

Indexing (document details)
Advisor: McDonald, Daniel
Commitee: Fukuyama, Julia, Tang, Haixu, Trosset, Michael
School: Indiana University
Department: Statistics
School Location: United States -- Indiana
Source: DAI-B 81/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Statistics
Keywords: Classification, Gene expression data, High-dimensional, Outlier detection, Principal components, Regression
Publication Number: 27744358
ISBN: 9798641838458
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest