Model: MLBp v0.1

0

This Communication describes the MLB prediction model MLBp v0.1.

status: alpha

The data mining process at statshacker is described here.

Described below are the relevant steps of this process, as applied to this model. Note that (for obvious reasons) only general, high-level details are described; precise ones are not made public.

“Business” Understanding

The objective of this model is to provide a (probabilistic) classification of the outcome of an MLB game.

Data Preparation

The (final) dataset consists of parameters selected from the following categories of statistics:

  • Batting
  • Pitching
  • Fielding
  • Team fielding
  • League information
  • Team performance

Prior to modeling, the data is pre-processed.

Modeling

This model approximates an optimal classifier, using sophisticated machine-learning and statistical methods.

Evaluation

Evaluation is performed by testing on data for years 2010–2017.

Training and testing are done by walkforward analysis [1] (the appropriate evaluation/optimization method for this problem).

Evaluation metrics are described in the remainder of this section.

Discriminatory Ability

Representative receiver operating characteristic (ROC) curves for the prior two years (2016 and 2017) are shown in the following figure (left and right, respectively):

Note that (coincidentally) these two plots represent points one full standard deviation from the average (on the low and high sides, respectively) (see below).

Discriminatory Accuracy

The area under the ROC curve (AUC) for each year tested are reported in the following table:

YearAUC
20170.5603
20160.6227
20150.5489
20140.5632
20130.6101
20120.5814
20110.6226
20100.5978

The average AUC is 0.59(3).

Prediction Accuracy

Brier scores are reported in in the following table:

YearBrier Score
20170.2469
20160.2400
20150.2499
20140.2467
20130.2410
20120.2443
20110.2430
20100.2427

The average Brier score is 0.244(3).

References

[1] R. Pardo. Design, Testing, and Optimization of Trading Systems (John Wiley & Sons, 1992). Expanded and updated edition.

Share.

About Author

statshacker is an Assistant Professor of Physics and Astronomy at a well-known state university. His research interests involve the development and application of concepts and techniques from the emerging field of data science to study large data sets. Outside of academic research, he is particularly interested in such data sets that arise in sports and finance. Contact: statshacker@statshacker.com

Leave A Reply