Dissertation/Thesis Abstract

Advances in online learning-based spam filtering
by Sculley, D., Ph.D., Tufts University, 2008, 212; 3320103
Abstract (Summary)

The low cost of digital communication has given rise to the problem of email spam, which is unwanted, harmful, or abusive electronic content. In this thesis, we present several advances in the application of online machine learning methods for automatically filtering spam. We detail a sliding-window variant of Support Vector Machines that yields state of the art results for the standard online filtering task. We explore a variety of feature representations for spam data. We reduce human labeling cost through the use of efficient online active learning variants. We give practical solutions to the one-sided feedback scenario, in which users only give labeling feedback on messages predicted to be non-spam. We investigate the impact of class label noise on machine learning-based spam filters, showing that previous benchmark evaluations rewarded filters prone to overfitting in real-world settings and proposing several modifications for combating these negative effects. Finally, we investigate the performance of these filtering methods on the more challenging task of abuse filtering in blog comments. Together, these contributions enable more accurate spam filters to be deployed in real-world settings, with greater robustness to noise, lower computation cost and lower human labeling cost.

Indexing (document details)
Advisor: Brodley, Carla E.
Commitee: Blumer, Anselm, Cormack, Gordon V., Crane, Gregory, Khardon, Roni
School: Tufts University
Department: Computer Science
School Location: United States -- Massachusetts
Source: DAI-B 69/08, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Class label noise, Online learning, Spam filtering
Publication Number: 3320103
ISBN: 9780549710165
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest