Dissertation/Thesis Abstract

A multiscale, geometric algorithm for non-parametric data exploration with an application to genomic data
by McQuown, Joseph, Ph.D., State University of New York at Stony Brook, 2007, 105; 3337616
Abstract (Summary)

This thesis presents an efficient and adaptive multi-scale algorithm for analyzing measurement data, composed of two categories: a regular set of measurements that can be described by means of a dominant geometry, and a set of "outliers", i.e., measurements that deviate from this underlying geometry. The algorithm uses a stopping-time construction in order to identify local regions of different sizes and shapes where the data is concentrated around local lines (or d-planes) and excluding local percentages of putative outliers that reside outside such regions. Thus it is able to construct efficiently a description of the dominant "geometry" in terms of a curve (or d-dimensional graph). Using the local geometric properties, it then detects the outliers. Our approach need not make any assumption about the distributional properties of the noise and it exhibits robustness against noise and outliers. Furthermore, the speed of our algorithm is linear in the size of the data and it can handle high-dimensional data without a blow-up of computational expense. Genomic expression data is an application that can be assayed quite well within this framework. This paper explores experimental results of such phenomena and describes some of the mathematical underpinnings of this algorithm and its various properties.

Indexing (document details)
School: State University of New York at Stony Brook
School Location: United States -- New York
Source: DAI-B 69/11, Dissertation Abstracts International
Subjects: Statistics
Keywords: Data exploration, Outlier detection
Publication Number: 3337616
ISBN: 978-0-549-91412-9
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy