Cyberbullying and cyberharassement are a growing issue that is straining the resources of human moderation teams. This is leading to an increase in suicide among the affected teens who are unable to get away from the harassment. By utilizing n-grams and support vector machines, this research was able to classify YouTube comments with an overall accuracy of 81.8%. This increased to 83.9% when utilizing retraining that added the misclassified comments to the training set. To accomplish this, a 350 comment balanced training set, with 7% of the highest entropy 3 length n-grams, and a polynomial kernel with the C error factor of 1, a degree of 2, and a Coef0 of 1 were used in the LibSVM implementation of the support vector machine algorithm. The 350 comments were also trimmed with a k-nearest neighbor algorithm where k was set to 4% of the training set size. With the algorithm designed to be heavily multi-threaded and capable of being run across multiple servers, the system was able to achieve that accuracy while classifying 3 comments per second, running on consumer grade hardware over Wi-Fi.
|Commitee:||Hamel, Lutz, Thoma, Lubos|
|School:||University of Rhode Island|
|School Location:||United States -- Rhode Island|
|Source:||DAI-B 78/09(E), Dissertation Abstracts International|
|Keywords:||Cyberbullying, Cyberharassment, Machine learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be