Clustering is a data analysis method which is used in a large variety of research fields. Many different algorithms exist for clustering, and none of them can be considered universally better than the others. Different methods of clustering are expounded upon, including hierarchical clustering and k-means clustering. Topological data analysis is also described, showing how topology can be used to infer structural information about the data set. We discuss how one finds the validity of clusters, as well as an optimal clustering method, and conclude with how we used various clustering methods to analyze transcriptome data from the ciliate Oxytricha trifallax. We discuss the structure of the data set, how an optimal clustering was chosen for this data set, how the validity of the clusters was confirmed, and how biological information can be extracted using gene ontology.
|Commitee:||Saito, Masahiko, Molla, Theodore|
|School:||University of South Florida|
|Department:||Mathematics and Statistics|
|School Location:||United States -- Florida|
|Source:||MAI 81/11(E), Masters Abstracts International|
|Keywords:||Ciliates, Gene ontology, Hierarchical, K-means, Topology|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
supplemental files is subject to the ProQuest Terms and Conditions of use.