Dissertation/Thesis Abstract

Bayesian Biclustering on Discrete Data: Variable Selection Methods
by Guo, Lei, Ph.D., Harvard University, 2013, 143; 3600176
Abstract (Summary)

Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we developed for discrete data, more specifically categorical data and ordinal data.

The international HapMap project has made available the single-nucleotide polymorphism (SNP) data of thousands of individuals across the world. We analyzed the SNPs data with our biclustering algorithm for categorical data and described the similarities between human populations. In contrast to existing methods, our method can locate the SNPs that are specific to subpopulation groups. This can provide insight to the genome-wide association study (GWAS) by eliminating SNPs that are common to different ethic groups. We also identified a number of SNPs that are linked to disease, and this thesis describes their behavior in different subpopulations. The biclustering process can be used as a variable selection step prior to existing population inference procedures.

Indexing (document details)
Advisor: Liu, Jun S.
Commitee: Dasgupta, Tirthankar, Harrington, David P.
School: Harvard University
Department: Statistics
School Location: United States -- Massachusetts
Source: DAI-B 75/02(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Biostatistics, Statistics
Keywords: Biclustering, Categorical data, HapMap, Ordinal data, Population structure
Publication Number: 3600176
ISBN: 9781303502309
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest