This thesis presents an efficient and adaptive multi-scale algorithm for analyzing measurement data, composed of two categories: a regular set of measurements that can be described by means of a dominant geometry, and a set of "outliers", i.e., measurements that deviate from this underlying geometry. The algorithm uses a stopping-time construction in order to identify local regions of different sizes and shapes where the data is concentrated around local lines (or d-planes) and excluding local percentages of putative outliers that reside outside such regions. Thus it is able to construct efficiently a description of the dominant "geometry" in terms of a curve (or d-dimensional graph). Using the local geometric properties, it then detects the outliers. Our approach need not make any assumption about the distributional properties of the noise and it exhibits robustness against noise and outliers. Furthermore, the speed of our algorithm is linear in the size of the data and it can handle high-dimensional data without a blow-up of computational expense. Genomic expression data is an application that can be assayed quite well within this framework. This paper explores experimental results of such phenomena and describes some of the mathematical underpinnings of this algorithm and its various properties.
|School:||State University of New York at Stony Brook|
|School Location:||United States -- New York|
|Source:||DAI-B 69/11, Dissertation Abstracts International|
|Keywords:||Data exploration, Outlier detection|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
supplemental files is subject to the ProQuest Terms and Conditions of use.