Dissertation/Thesis Abstract

Modeling content lifespan in Online Social Networks using data mining
by Gibbons, John, Ph.D., University of Kansas, 2014, 111; 3625219
Abstract (Summary)

Online Social Networks (OSNs) are integrated into business, entertainment, politics, and education; they are integrated into nearly every facet of our everyday lives. They have played essential roles in milestones for humanity, such as the social revolutions in certain countries, to more day-to-day activities, such as streaming entertaining or educational materials. Not surprisingly, social networks are the subject of study, not only for computer scientists, but also for economists, sociologists, political scientists, and psychologists, among others. In this dissertation, we build a model that is used to classify content on the OSNs of Reddit, 4chan, Flickr, and YouTube according the types of lifespan their content have and the popularity tiers that the content reaches. The proposed model is evaluated using 10-fold cross-validation, using data mining techniques of Sequential Minimal Optimization (SMO), which is a support vector machine algorithm, Decision Table, Naïve Bayes, and Random Forest. The run times and accuracies are compared across OSNs, models, and data mining algorithms.

The peak/death category of Reddit content can be classified with 64% accuracy. The peak/death category of 4Chan content can be classified with 76% accuracy. The peak/death category of Flickr content can classified with 65% accuracy. We also used 10-fold cross-validation to measure the accuracy in which the popularity tier of content can be classified. The popularity tier of content on Reddit can be classified with 84% accuracy. The popularity tier of content on 4chan can be classified with 70% accuracy. The popularity tier of content on Flickr can be classified with 66% accuracy. The popularity tier of content on YouTube can be classified with only 48% accuracy.

Our experiments compared the runtimes and accuracy of SMO, Naïve Bayes, Decision Table, and Random Forest to classify the lifespan of content on Reddit, 4chan, and Flickr as well as classify the popularity tier of content on Reddit, 4chan, Flickr, and YouTube. The experimental results indicate that SMO is capable of outperforming the other algorithms in runtime across all OSNs. Decision Table has the longest observed runtimes, failing to complete analysis before system crashes in some cases. The statistical analysis indicates, with 95% confidence, there is no statistically significant difference in accuracy between the algorithms across all OSNs. Reddit content was shown, with 95% confidence, to be the OSN least likely to be misclassified. All other OSNs, were shown to have no statistically significant difference in terms of their content being more or less likely to be misclassified when compared pairwise with each other.

Indexing (document details)
Advisor: Agah, Arvin
Commitee: Dhar, Prajna, Grzymala-Busse, Jerzey, Miller, James, Perry, Alexander
School: University of Kansas
Department: Electrical Engineering and Computer Science
School Location: United States -- Kansas
Source: DAI-A 75/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Web Studies, Information science
Keywords: Content, Data mining, Lifespan, Online social network
Publication Number: 3625219
ISBN: 9781303999529
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest