High-dimensional problems such as regression, classification, and outlier detection play important roles in fields such as genomics and neuroscience. Methods based on principal components have been widely used for solving such problems because principal components construct a linear manifold in lower dimensions which accounts for unknown grouping structures of the features while preserving a large amount of information from the raw data. In linear prediction problems, feature selection has always been a key desire in these fields in addition to accurate prediction and fast computation. Existing sparse methods utilizing principal components have been stunningly successful, but they can fail to select the correct predictors, predict poorly relative to non-sparse alternatives, or have poor computational tractability. In outlier detection problems, current methods based on principal components depend highly on the raw data, neglecting the underlying structure of outliers, and are thus underpowered. In this dissertation, I develop two methods, AIMER and SuffPCR, to handle high-dimensional prediction tasks including regression and classification, especially in the context of sparse linear manifolds. While AIMER employs a linear algebraic technique for the approximation of principal components, SuffPCR is an extension of sparse PCA. I also create a new method, PCATF, for solving high-dimensional outlier detection problems utilizing the estimation of principal components with constant trend filtering. These new methods generally work well for a variety of simulated, semi-simulated, and real genomics and neuroimaging data and are accompanied by some theoretical guarantees.
|Commitee:||Fukuyama, Julia, Tang, Haixu, Trosset, Michael|
|School Location:||United States -- Indiana|
|Source:||DAI-B 81/10(E), Dissertation Abstracts International|
|Keywords:||Classification, Gene expression data, High-dimensional, Outlier detection, Principal components, Regression|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be