COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Classifying Websites Using Word Vectors and Other Techniques: An Application of Zipf’s Law
by Robles, Alejandro, M.S., California State University, Long Beach, 2019, 50; 22589849
Abstract (Summary)

In this paper, we explore the effectiveness of using Zipf’s Law as a pre-processing step for classifying websites into four categories: Sports, Games & Toys, Travel, and Food & Drink. The classifiers used were Multinomial Logistic Regression and Convolutional Neural Network (CNN) using Global Vectors for Word Representation (GloVe) word embeddings. The CNN with GloVe embeddings as input produces 92% accuracy but increases to 93% when applying Zipf’s Law. The worst performing was the logistic regression with GloVe embeddings with accuracy of 90%. After we transformed our multi-class classification problem into a binary one, we saw a jump in accuracy. All models got an accuracy of 94% except for the base model (TF-IDF & LOGIT), which got a 93% accuracy.

All the code can be found on

Indexing (document details)
Advisor: Suaray, Kagba
Commitee: Zhang, Wenlu, VonBrecht, James
School: California State University, Long Beach
Department: Mathematics and Statistics
School Location: United States -- California
Source: MAI 81/4(E), Masters Abstracts International
Subjects: Statistics
Keywords: Adtech, Deep Learning, Machine Learning, NLP, Zipf's Law
Publication Number: 22589849
ISBN: 9781392806029
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy