Football Equity

The League Equity Index

In the context of passionate discourse between football fans, you often hear criticism leveled at particular football leagues centered around how predictable they are each year. This predictability is often an observation of the monotony of the top placings in the league or even the winners each season.

These critiques are essentially rooted in an evaluation of the fairness of a football league and a belief that a league should have some inherent fairness embedded into it. This begs the question, how can we define fairness in this context, and is it possible to objectively measure it?

This is where the idea for the League Equity Index (LEI) came from, an attempt to ascribe a measurement to a football league’s fairness.

Desirable properties

However we go about defining such a measure, we will have some criteria for its properties. Ideally, we would like a measure that is bounded so that we can perceive how close to the maximum or minimum fairness a given football league is. Similarly, the measure should be comparable across football leagues so that we can compare the rating between two distinct football leagues. These two leagues may have different characteristics, such as the number of teams in the league, so our measure should incorporate those factors also.

Defining fairness

In order to measure deviations away from fairness, we need to first define it. There are many ways to go about this, but the way we have measured it is to say that absolute fairness is a state of the world where every team in a football league has an equal probability of winning in any given year. That is, in a league of 20 teams, each team has an equal 5% change of winning. This might seem like an extreme definition of fairness, however it is quite a natural one. More importantly, it is a mathematically tractable one.

Defining the measure

Now that we have defined fairness, we can then begin to define our measure. Let’s first start by defining the distribution of league title winners for a given league. Let’s call the observed distribution \(P(x)\) where \(x\) is the set of all teams that have competed throughout the league’s history. By definition, the distribution of league titles is discrete and categorical. It’s worth pointing out that because of this fact, it is not possible to compare \(P(x)\) across leagues as they are defined on different \(x\). This is precisely the purpose of the LEI, to define a value that is comparable across leagues.

Now that we have defined our real-world distribution, we must define the theoretical distribution \(Q(x)\) which is the distribution observed under the assumption of fairness. Recall that fairness was defined as each team having an equal probability of winning in any given season. By construction, \(P(x)\) and \(Q(x)\) have the same support but are essentially assigning (possibly) different probability mass to each \(x\).

As a result, our attempt to measure fairness reduces to an exercise in measuring the difference between our two discrete distributions. There is an abundance of literature on methods to compare discrete distributions. Noting the desired properties listed above, a natural methodology is to use divergence measures.

Divergence measures

A divergence measure is fundamentally a measure of statistical distance. In our context, it is analogous to the distance between the real-world and the theoretical fair-world. The LEI is based on a divergence measure that is appropriately adjusted to give a range between 0 and 100.

As such, we denote the LEI as

\[\begin{align} \text{LEI}_{t}&=f\big(P_{t}(x), Q_{t}(x)\big) \end{align}\]

where \(P_{t}(x)\) and \(Q_{t}(x)\) refer to the distribution defined over a specific time period \(t\), and \(f()\) is the functional form of the chosen divergence measure. This allows us to define the measure on arbitrary periods as well as the whole league history. By construction we have that \[0\leq \text{LEI}_{t}\leq 100\] giving us the bounded measure as desired.

By convention, a score of 0 indicates a maximally inequitable scenario (bad) whereas a score of 100 represents maximum equity (good).

In practice

If the above was not illuminating or illustrative enough, let’s now run a practical example. We’ll walk through the calculation for England for a time period of the last 20 years.

Defining P

We first start by defining our probability distributions. For the real-world case, \(P(x)\), this is straightforward. Each team’s probability mass is simply their share of the league titles won in that time period. For example, over the 20-year period up to the end of the 2020/21 season, Manchester United won 6 titles. As a result, their probability mass is \[P(\text{Man United})=\frac{6}{20}=0.3\]

This is repeated for all teams.

Defining Q

In order to define the theoretical fair probability distribution for any team, we need to sum across the years they competed in the league and aggregate their fair probabilities. For example, Leicester had competed in 9 of those 20 seasons and in each season there were 20 teams in the league. We also divide by the number of seasons. This means that Manchester City’s fair probability mass is \[\begin{align}Q(\text{Leicester})&=\frac{1}{20}\bigg(\sum_{i=1}^{9}p_{i}\bigg)\\&=\frac{1}{20}\bigg(\sum_{i=1}^{9}\frac{1}{20}\bigg)=0.0225\end{align}\]

The figure below shows the empirical \(P(x)\) and \(Q(x)\) for this time period. In general we have that \(P(x)\neq Q(x)\). In fact, for this example, \(P(x)= Q(x)\) only for Arsenal and Liverpool, meaning that they each won their fair share of titles over that period.

Once we have our two distributions, we simply calculate our divergence index which tells us how far the observed distribution of titles has diverged from a fair allocation.

EPL

2003-2022

Chosen measures

Throughout this site you will see two versions of the LEI implemented. The first of these is denoted \(\text{LEI}_{\infty}\) and represents the index defined on the full league history up until that point. This is an overall measure of the equity of the league's distribution of titles over its full history.

Similarly, we also implement a second measure, denoted \(\text{LEI}_{10}\), that represents the index based on the recent 10 seasons only. This measure signals how equitable the recent league title distribution has been. By construction, it will be more volatile than the \(\text{LEI}_{\infty}\) but it will provide a rapid signal of deteriorating equity in a league.

Interpreting the measure

Interpretation of the LEI is fairly straightforward. Heuristically, it is a measure of how far a given league's title distribution is from being completely fair. The closer to 100 the measure gets, the closer the league gets to that fair distribution.

Conversely, the closer to 0 the measure gets, the further away we are from the fair distribution. An LEI of 0 would indicate we have reached the maximum (statistical) distance away from that fair distribution.

If we look at football leagues that implement specific controls for enforcing equality of finances and player acquisition, we can gauge how high we might expect the LEI to be under those interventions. Generally, with salary caps and player drafts, we see LEI values in the 80-90 range.

Issues

If there's any material issues with the methodology, feel free to raise an issue at the GitHub repository.