Dissertation/Thesis Abstract

Significant distinct branches of hierarchical trees: A framework for statistical analysis and applications to biological data
by Sun, Guoli, Ph.D., State University of New York at Stony Brook, 2014, 91; 3685086
Abstract (Summary)

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.

We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.

One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html.

Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html.

Supplemental Files

Some files may require a special program or browser plug-in. More Information

Indexing (document details)
Advisor: Krasnitz, Alexander
Commitee: Finch, Stephen, Yoon, Seungtai, Zhu, Wei
School: State University of New York at Stony Brook
Department: Applied Mathematics and Statistics
School Location: United States -- New York
Source: DAI-B 76/07(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Statistics, Bioinformatics
Keywords: Clustering, Hierarchical, Randomizations, TBEST
Publication Number: 3685086
ISBN: 978-1-321-60684-3
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest