Because of the massive increase for streams available and being produced, the areas of data mining and machine learning have become increasingly popular. This takes place as companies, organizations and industries seek out optimal methods and techniques for processing these large data sets. Machine learning is a branch of artificial intelligence that involves creating programs that autonomously perform different data mining techniques when exposed to data streams. The study evaluates at two very different domains in an effort to provide a better and more optimized applicable method of clustering than is currently being used. We examine the use of data mining in healthcare, as well as the use of these techniques in the social media domain. Testing the proposed technique on these two drastically different domains offers us valuable insights into the performance of the proposed technique across domains.
This study aims at reviewing the existing methods of clustering and presenting an enhanced k-means clustering algorithm by using a novel method called Optimize Cluster Distance (OCD) applied to social media domain. This (OCD) method maximizes the distance between clusters by pair-wise re-clustering to enhance the quality of the clusters. For the healthcare domain, the k-means was applied along with Self Organizing Map (SOM) to get an optimal number of clusters. The possibility of getting bad positions of centroids in k-means was solved by applying the Genetic algorithm to the k-means in social media and healthcare domains. The OCD was applied again to enhance the quality of the produced clusters. In both domains, compared to the conventional k-means, the analysis shows that the proposed k-means is accurate and achieves better clustering performance along with valuable insights for each cluster. The approach is unsupervised, scalable and can be applied to various domains.
|Commitee:||Jackson, Lethia, Stone, Daryl, Yan, Jie, Yesha, Yelena|
|School:||Bowie State University|
|School Location:||United States -- Maryland|
|Source:||DAI-B 78/06(E), Dissertation Abstracts International|
|Keywords:||Data mining, Genetic algorithm, Health care, K-means, Self-organizing map, Social media|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be