This Communication describes the MLB prediction model MLBp v0.1.
status: alpha
The data mining process at statshacker is described here.
Described below are the relevant steps of this process, as applied to this model. Note that (for obvious reasons) only general, high-level details are described; precise ones are not made public.
“Business” Understanding
The objective of this model is to provide a (probabilistic) classification of the outcome of an MLB game.
Data Preparation
The (final) dataset consists of parameters selected from the following categories of statistics:
- Batting
- Pitching
- Fielding
- Team fielding
- League information
- Team performance
Prior to modeling, the data is pre-processed.
Modeling
This model approximates an optimal classifier, using sophisticated machine-learning and statistical methods.
Evaluation
Evaluation is performed by testing on data for years 2010–2017.
Training and testing are done by walkforward analysis [] (the appropriate evaluation/optimization method for this problem).
Evaluation metrics are described in the remainder of this section.
Discriminatory Ability
Representative receiver operating characteristic (ROC) curves for the prior two years (2016 and 2017) are shown in the following figure (left and right, respectively):
Note that (coincidentally) these two plots represent points one full standard deviation from the average (on the low and high sides, respectively) (see below).
Discriminatory Accuracy
The area under the ROC curve (AUC) for each year tested are reported in the following table:
Year | AUC |
---|---|
2017 | 0.5603 |
2016 | 0.6227 |
2015 | 0.5489 |
2014 | 0.5632 |
2013 | 0.6101 |
2012 | 0.5814 |
2011 | 0.6226 |
2010 | 0.5978 |
The average AUC is 0.59(3).
Prediction Accuracy
Brier scores are reported in in the following table:
Year | Brier Score |
---|---|
2017 | 0.2469 |
2016 | 0.2400 |
2015 | 0.2499 |
2014 | 0.2467 |
2013 | 0.2410 |
2012 | 0.2443 |
2011 | 0.2430 |
2010 | 0.2427 |
The average Brier score is 0.244(3).
References
[] R. Pardo. Design, Testing, and Optimization of Trading Systems (John Wiley & Sons, 1992). Expanded and updated edition.