Dissertation/Thesis Abstract

Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset
by Peterson, Angela R., M.S., Kutztown University of Pennsylvania, 2009, 59; 1462396
Abstract (Summary)

This thesis examines the use of parallel coordinate (PC) plots for visual data mining. It concentrates on graphs using PC plots with multidimensional data sets. The concept of the “polyline” and parallel axis are defined. These are the basic building blocks for graphing a parallel coordinate plot. Visualization problems with parallel coordinate plots typically involve ambiguity and clutter. These two issues are addressed by using the technique of “clustering and color”. The use of color in a parallel coordinate plot reduces the problem of ambiguity. Separating the data set into natural groups, or clusters, reduces clutter. A methodology is outlined that describes how to cluster and color a multidimensional data set. The K-means clustering algorithm will be introduced. Application of K-means to produce clusters of polylines in a PC plot is shown. The ‘K’ from K-means is defined as the number of clusters. The value for K is user defined. In the spirit of graphical visualization, to select the “best” number for K, the “distortion plot” is introduced. Once the methodology of graphing a meaningful parallel coordinate plot is outlined, it is illustrated with an analysis of a real multidimensional data set. The thesis finishes with a summary of the effectiveness and applications of visual data mining using a series of PC plots with clustering and color.

Indexing (document details)
Advisor: Kaplan, Randy
Commitee: Day, Linda, Spiegel, Daniel
School: Kutztown University of Pennsylvania
Department: Computer and Information Science
School Location: United States -- Pennsylvania
Source: MAI 47/05M, Masters Abstracts International
Source Type: DISSERTATION
Subjects: Statistics, Computer science
Keywords: Ambiguity, Clutter, Distortion plot, Hartigan's index, Lattice plot, R programming language
Publication Number: 1462396
ISBN: 9781109073072
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest