COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at www.proquest.com.

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

A multiscale, geometric algorithm for non-parametric data exploration with an application to genomic data
by McQuown, Joseph, Ph.D., State University of New York at Stony Brook, 2007, 105; 3337616
Abstract (Summary)

This thesis presents an efficient and adaptive multi-scale algorithm for analyzing measurement data, composed of two categories: a regular set of measurements that can be described by means of a dominant geometry, and a set of "outliers", i.e., measurements that deviate from this underlying geometry. The algorithm uses a stopping-time construction in order to identify local regions of different sizes and shapes where the data is concentrated around local lines (or d-planes) and excluding local percentages of putative outliers that reside outside such regions. Thus it is able to construct efficiently a description of the dominant "geometry" in terms of a curve (or d-dimensional graph). Using the local geometric properties, it then detects the outliers. Our approach need not make any assumption about the distributional properties of the noise and it exhibits robustness against noise and outliers. Furthermore, the speed of our algorithm is linear in the size of the data and it can handle high-dimensional data without a blow-up of computational expense. Genomic expression data is an application that can be assayed quite well within this framework. This paper explores experimental results of such phenomena and describes some of the mathematical underpinnings of this algorithm and its various properties.

Indexing (document details)
Advisor:
Commitee:
School: State University of New York at Stony Brook
School Location: United States -- New York
Source: DAI-B 69/11, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Statistics
Keywords: Data exploration, Outlier detection
Publication Number: 3337616
ISBN: 978-0-549-91412-9
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest