There has been a longstanding tradition of using statistics in sports analytics. Over the past decade, statistics has become the key for team operations on both the business side and the sports side. In this thesis we classify a successful pitching outcome (as defined by a strike out) in the MLB using binary logistic regression, random forests, neural networks, and Bayesian Logistic Regression. We use standard model fitting tests to determine the best predicting model. As described in the modeling scheme of Deshpande and Wyner, first we will build our classifiers with the 2015/16 season data, use the 2016/17 to tune the models and predict the 2017/18 data to validate them. We will use data from the MLB Pitch Data sets taken during 2015 to 2018 with a total of 52 variables and 736,325 observations. The observations being used only account for the last pitch thrown for a specific batter ID number and the event of that last pitch (i.e., a strikeout, groundball, etc.,) and have all missing observations removed. Some of the variables used in the analysis are the pitch type (fastball, changeup, cutter, and so on), the hand that the pitchers used to throw the ball (left or right), the starting speed of the pitch, the ending speed of the pitch, the pitch count (specifically when the ball count is 3 and the strike count is 2), and the locations—both vertical and horizontal—of the pitch when it was 50 feet from home plate (how far off from the middle of the plate). We were able to develop two models that beat the MLBs umpire’s accuracy, according to Boston University’s master lecturer Mark Williams, using random forest with our original data and neural network with a balanced data set.
|Commitee:||Korosteleva, Olga, Safer, Alan|
|School:||California State University, Long Beach|
|Department:||Mathematics and Statistics|
|School Location:||United States -- California|
|Source:||MAI 82/8(E), Masters Abstracts International|
|Subjects:||Statistics, Artificial intelligence|
|Keywords:||Baseball, Bayesian regression, Binary regression, Neural network, Random forest, Statistics|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be