Biclustering is a technique for clustering rows and columns of a data matrix simultaneously. Over the past few years, we have seen its applications in biology-related fields, as well as in many data mining projects. As opposed to classical clustering methods, biclustering groups objects that are similar only on a subset of variables. Many biclustering algorithms on continuous data have emerged over the last decade. In this dissertation, we will focus on two Bayesian biclustering algorithms we developed for discrete data, more specifically categorical data and ordinal data.
The international HapMap project has made available the single-nucleotide polymorphism (SNP) data of thousands of individuals across the world. We analyzed the SNPs data with our biclustering algorithm for categorical data and described the similarities between human populations. In contrast to existing methods, our method can locate the SNPs that are specific to subpopulation groups. This can provide insight to the genome-wide association study (GWAS) by eliminating SNPs that are common to different ethic groups. We also identified a number of SNPs that are linked to disease, and this thesis describes their behavior in different subpopulations. The biclustering process can be used as a variable selection step prior to existing population inference procedures.
|Advisor:||Liu, Jun S.|
|Commitee:||Dasgupta, Tirthankar, Harrington, David P.|
|School Location:||United States -- Massachusetts|
|Source:||DAI-B 75/02(E), Dissertation Abstracts International|
|Keywords:||Biclustering, Categorical data, HapMap, Ordinal data, Population structure|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be