COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Prediction of Final Pitch Outcome in MLB Games Using Statistical Learning Methods
by Barbee, Jasmine, M.S., California State University, Long Beach, 2020, 83; 28151661
Abstract (Summary)

There has been a longstanding tradition of using statistics in sports analytics. Over the past decade, statistics has become the key for team operations on both the business side and the sports side. In this thesis we classify a successful pitching outcome (as defined by a strike out) in the MLB using binary logistic regression, random forests, neural networks, and Bayesian Logistic Regression. We use standard model fitting tests to determine the best predicting model. As described in the modeling scheme of Deshpande and Wyner, first we will build our classifiers with the 2015/16 season data, use the 2016/17 to tune the models and predict the 2017/18 data to validate them. We will use data from the MLB Pitch Data sets taken during 2015 to 2018 with a total of 52 variables and 736,325 observations. The observations being used only account for the last pitch thrown for a specific batter ID number and the event of that last pitch (i.e., a strikeout, groundball, etc.,) and have all missing observations removed. Some of the variables used in the analysis are the pitch type (fastball, changeup, cutter, and so on), the hand that the pitchers used to throw the ball (left or right), the starting speed of the pitch, the ending speed of the pitch, the pitch count (specifically when the ball count is 3 and the strike count is 2), and the locations—both vertical and horizontal—of the pitch when it was 50 feet from home plate (how far off from the middle of the plate). We were able to develop two models that beat the MLBs umpire’s accuracy, according to Boston University’s master lecturer Mark Williams, using random forest with our original data and neural network with a balanced data set.

Indexing (document details)
Advisor: Suaray, Kagba
Commitee: Korosteleva, Olga, Safer, Alan
School: California State University, Long Beach
Department: Mathematics and Statistics
School Location: United States -- California
Source: MAI 82/8(E), Masters Abstracts International
Subjects: Statistics, Artificial intelligence
Keywords: Baseball, Bayesian regression, Binary regression, Neural network, Random forest, Statistics
Publication Number: 28151661
ISBN: 9798582504405
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy