Many techniques are available to combat the spread of unwanted emails and online spams. One popular technique is content-based Bayesian filters. Spammers have found techniques to defeat these filters. A structure-based anti-spam technique uses a different approach to the spam problem by checking for the structure of a message instead of its content. The structure of an email is extracted from the DOM (Document Object Model) of the HTML (Hyper Text Markup Language) in the email. We implemented a tree-based comparison and quadratic weighted level scoring system to find similarities between emails. This method is used for email classification so that similar emails can be grouped together. Upon classification of an email, we compared the domain of the email to the whitelisted domains. If the domains do not match we label the email as a spam. The experimental results showed a high success rate of spam detection and email classification.
|Commitee:||Englert, Burkhard, Murgolo, Frank|
|School:||California State University, Long Beach|
|Department:||Computer Engineering and Computer Science|
|School Location:||United States -- California|
|Source:||MAI 55/03M(E), Masters Abstracts International|
|Keywords:||Bayesian filter, Document object model, Email classification, Phising, Spam detection, Structured based technique|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be