The relationship between runs and wins is one of the most (if not the most) important in baseball research. It provides the quantitative connection needed to answer fundamental questions. In addition, considering its development reveals that only a few (basic) variables (that go into it) provide significant insight into the game of baseball. Despite these points, this relationship is often misunderstood, and its utility is often misused.

In this Article, the relationship between runs and wins and its utility are considered in detail.

Note that the historical development of this relationship (including alternative formulations, etc.) is considered in a separate article.

## The Relationship Between Runs and Wins

Mathematically, the relationship between runs and wins can be specified by the following expression:

(1)

where is the number of wins, is the number of games, and and are the runs scored by a given team and allowed by their (collective) opponent.

There is only one constraint that the relationship *must* satisfy:

- is bounded with the range .

There are several properties though that it *should* have.

## The Run–Win Relationship

### Linear Models

Basic insight into the run–win relationship can be obtained by noting that for a single game, a win or loss is specified by the following conditions:

Over the course of many games, a natural and plausible assumption is therefore that the (total) run differential ,

(2)

will be strongly correlated with the number of wins (this is exact for a single game, noting the prior conditions). This validity of this assumption can be verified by calculation.

Normalizing per game, the above assumption implies a strong linear correlation between and . This means that a **linear model** (e.g., Ref. []) should be quite accurate,

(3)

where is the slope of the line (based on the correlation, which we know to be positive), is set to ensure the condition for , and is the error term.

The fundamental problem with linear models is that they aren’t bounded, for

### Pythagorean Expectation

An improved run–win relationship can be derived under two assumptions:

- The “quality” of a baseball team is measured by the ratio of to ,

(4)

- Baseball teams win in proportion to their quality.

Note that the first assumption is an extension of that associated above with Eq. (2, and will be considered below (the run environment)); the second will also be considered below (the importance of chance).

Consider Team A that plays against a (collective) Team B. The above assumptions lead to the probability that Team A wins is then

(5)

where and are defined by Eq. (4); note that the latter (from the perspective of Team A) must necessarily be

Note that care must be taken (as above) to interpret the latter only as the collective opponent []. Only in this case does

and the above probabilistic interpretation hold.

Inserting and into Eq. (5) gives

(6)

This equation is recognized as the “Pythagorean theorem” developed by Bill James [], or more commonly known as the **Pythagorean expectation**.

An important improvement relative to the linear model is that the function in Eq. (6) is bounded with the range , satisfying the constraint.

Note that a multivariate Taylor series expansion (to first order) about results in Eq. (3) [].

### The Importance of Chance

While the second assumption underlying to the Pythagorean expectation is plausible, it is not natural. This is because the extent to which it is valid is dependent on the importance of chance.

There are several ways in which to correct Eq. (6).

The most common approach is to consider a fixed, but (possibly) different exponent,

(7)

known as the **fixed-exponent model**. Note that for , this expression reduces to Eq. (6). That commonly used [] is .

Note that the above approach is *ad hoc*. However, Eq. (7) is found [] to be theoretically justified, by modeling the number of runs scored and allowed as independent random variables drawn from some distribution (which inherently captures the importance of chance).

### The Run Environment

One thing missing in the Pythagorean expectation is that it does not consider the run environment. This result initiates with the assumption in Eq. (4), which is insensitive to scaling.

The run environment, however, is directly related to the importance of chance (discussed above). The higher the margin of victory (or defeat) (per game), the less likely that the result was due to chance. Indeed, considering these margins in a win expectancy model [] reveals this to be the case (and why different sports yield different results).

Under the exponent-correction approach (to chance), it makes sense to there consider such a correction.

Over a wide range of values,

(8)

where is the runs per game (of both teams),

(9)

is found to give the best answers; this includes the mandatory value of at . This formula is known as the **Pythagenpat formula**.

## The Utility of the Run–Win Relationship

The run–win relationship is often used [5] to predict the expected numbers of wins and losses; leading to the notion of whether a team was “lucky” (or “unlucky”). While there is *some* utility in this, it detracts from its fundamental utility of this relationship [].

The goal of a baseball team (over a game, or course of games) is to win games. Questions in baseball research are therefore fundamentally concerned with the importance of a particular quantity in terms of wins.

It is impossible to quantify the importance of any particular quantity directly in terms of wins. It is possible, however, to directly quantify such in terms of runs. The run–win relationship therefore provides the connection needed to answer the fundamental questions.

## Conclusions

The relationship between runs and wins [Eq. (1)] is simply that, a relationship. Considering its development in detail (as in this Article) shows that it nonetheless provides significant insight into the game of baseball.

This relationship provides insight into:

- the quality of a baseball team [Eq. (4)]
- how baseball teams win games [Eq. (5)]
- the importance of chance
- the run environment ~~~ .

The utility of this relationship then that it provides the quantitative connection needed to answer fundamental questions in baseball research. The most accurate of these being Eq. (7) with the exponent given by Eqs. (8) and (9).

Given that the run–win relationship is based on several assumptions though, there is likely room for improvement. Indeed, as indicated by recent references (e.g., in this Article, Refs. [, , , ]), it constitutes an active area for current and future research.

## References

[] One of the earliest and perhaps most famous linear models: A. Soolman, *Unpublished*.

[] J. Heumann, “An improvement to the baseball statistic “Pythagorean Wins”,” *Journal of Sports Analytics* **2**, 49–59 (2016).

[] B. James, *Baseball Abstract* (Ballantine Books, 1983).

[] K. D. Dayaratna and S. J. Miller, “First-Order Approximations of the Pythagorean Formula,” *By The Numbers — The Newsletter for the SABR Statistical **Analysis Research Committee* **22**, 15–19 (2012).

[] Baseball-Reference; Accessed: 2018-03-18.

[] S. J. Miller, “A Derivation of the Pythagorean Won-Loss Formula in Baseball,” *CHANCE* **20**, 40–48 (2007).

[] E. H. Kaplan and C. Rich, “Decomposing Pythagoras,” *J. Quant. Anal. Sports* **13**, 141–149 (2017).

[] FanGraphs; Accessed: 2018-03-18.

## 1 Comment

Pingback: The Historical Development of the Run--Win Relationship | statshacker