COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Efficient genetic k-means clustering algorithm and its application to data mining on different domains
by Alsayat, Ahmed Mosa, D.Sc., Bowie State University, 2016, 107; 10239708
Abstract (Summary)

Because of the massive increase for streams available and being produced, the areas of data mining and machine learning have become increasingly popular. This takes place as companies, organizations and industries seek out optimal methods and techniques for processing these large data sets. Machine learning is a branch of artificial intelligence that involves creating programs that autonomously perform different data mining techniques when exposed to data streams. The study evaluates at two very different domains in an effort to provide a better and more optimized applicable method of clustering than is currently being used. We examine the use of data mining in healthcare, as well as the use of these techniques in the social media domain. Testing the proposed technique on these two drastically different domains offers us valuable insights into the performance of the proposed technique across domains.

This study aims at reviewing the existing methods of clustering and presenting an enhanced k-means clustering algorithm by using a novel method called Optimize Cluster Distance (OCD) applied to social media domain. This (OCD) method maximizes the distance between clusters by pair-wise re-clustering to enhance the quality of the clusters. For the healthcare domain, the k-means was applied along with Self Organizing Map (SOM) to get an optimal number of clusters. The possibility of getting bad positions of centroids in k-means was solved by applying the Genetic algorithm to the k-means in social media and healthcare domains. The OCD was applied again to enhance the quality of the produced clusters. In both domains, compared to the conventional k-means, the analysis shows that the proposed k-means is accurate and achieves better clustering performance along with valuable insights for each cluster. The approach is unsupervised, scalable and can be applied to various domains.

Indexing (document details)
Advisor: El-Sayed, Hoda
Commitee: Jackson, Lethia, Stone, Daryl, Yan, Jie, Yesha, Yelena
School: Bowie State University
Department: Computer Science
School Location: United States -- Maryland
Source: DAI-B 78/06(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Data mining, Genetic algorithm, Health care, K-means, Self-organizing map, Social media
Publication Number: 10239708
ISBN: 978-1-369-43150-6
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy