How Do Newly Promoted Clubs Survive In The EPL?

Part 2: The Four Survival KPIs

The first part of this two-part consideration of the prospects of newly promoted clubs surviving in the English Premier League (EPL) concluded that the lower survival rate in recent seasons was due to poorer defensive records rather than any systematic reduction in wage expenditure relative to other EPL clubs. It was also suggested that there might be a Moneyball-type inefficiency with newly promoted teams possibly allocating too large a proportion of their wage budget to over-valued strikers when more priority should be given to improving defensive effectiveness. In this post, the focus is on identifying four key performance indicators (KPIs) for newly promoted clubs that I will call the “survival KPIs”. These survival KPIs are then combined using a logistic regression model to determine the current survival probabilities of Burnley, Leeds United and Sunderland in the EPL this season.

The Four Survival KPIs

The four survival KPIs are based on four requirements for a newly promoted club:

  • Squad quality measured as wage expenditure relative to the EPL median
  • Impetus created by a strong start to the season measured by points per game in the first half of the season
  • Attacking effectiveness measured by goals scored per game
  • Defensive effectiveness measured by goals conceded per game

Using data on the 89 newly promoted clubs in the EPL from seasons 1995/96 – 2024/25, these clubs have been allocated to four quartiles for each survival KPI. Table 1 sets out the range of values for each quartile, with Q1 as the quartile most likely to survive through to Q4 as the quartile most likely to be relegated. Table 2 reports the relegation probabilities for each quartile for each KPI. So, for example, as regards squad quality, Table 1 shows that the top quartile (Q1) of newly promoted clubs had wage costs at least 79.5% of the EPL median that season. Table 2 shows that only 22.7% of these clubs were relegated. In contrast, the clubs in the lowest quartile (Q4) had wage costs less than 55% of the EPL median that season and 77.3% of these clubs were relegated.

Table 1: Survival KPIs, Newly Promoted Clubs in the EPL, 1995/96 – 2024/25

Table 2: Relegation Probabilities, Newly Promoted Clubs in the EPL, 1995/96 – 2024/25

The standout result is the low relegation probability for newly promoted clubs in Q1 for the Impetus KPI. Only 8% of newly promoted clubs with an average of 1.21 points per game or better in the first half of the season have been relegated. This equates to 23+ points after 19 games. Only 17 newly promoted clubs have achieved 23+ points by mid-season in the 30 seasons since 1995 and only two have done so in the last five seasons – Fulham in 2022/23 with 31 points and the Bielsa-led Leeds United with 26 points in 2020/21.

It should be noted that there is little difference in the relegation probabilities between Q2 and Q3, the mid-range values for both Squad Quality and Attacking Effectiveness, suggesting that marginal improvements in both of these KPIs have little impact for most clubs. As regards defensive effectiveness, both Q1 and Q2 have low relegation quartiles suggesting that the crucial benchmark is limiting goals conceded to under 1.61 goals per game (or 62 goals conceded over the entire season). Of the 43 newly promoted clubs that have done so since 1995, only seven have been relegated, a relegation probability of 16.3%. Reinforcing the main conclusion from the previous post that the main reason that for the poor performance of newly promoted clubs in recent seasons, only four clubs have conceded fewer than 62 goals in the last five seasons – Fulham (53 goals conceded, 2020/21), Leeds United (54 goals conceded, 2020/21); Brentford (56 goals conceded, 2021/22) and Fulham (53 goals conceded, 2022/23) – with of these four clubs, only Fulham being relegated in 2020/21 (primarily due to their poor attacking effectiveness).

Where Did The Newly Promoted Clubs Go Wrong Last Season?

Just as in the previous season 2023/24, so too last season, all three newly promoted clubs – Ipswich Town, Leicester City and Southampton – were relegated. Table 3 reports the survival KPIs for these clubs. In the case of Ipswich Town, their Squad Quality was low with relative expenditure under 50% of the EPL median. In contrast Leicester City spent close to the EPL median and Southampton were just marginally under the Q1 threshold. The Achilles Heel for all three clubs was their very poor defensive effectiveness, conceding goals at a rate of over two goals per game. Only 11 newly promoted clubs have conceded 80+ goals since 1995; all have been relegated.

Table 3: Survival KPIs, Newly Promoted Clubs in the EPL, 2024/25

*Calculated using estimated squad salary costs sourced from Capology (www.capology.com)

What About This Season?

As I write, seven rounds of games have been completed in the EPL. Of the three newly promoted clubs, the most impressive start has been by Sunderland who are currently 9th in the EPL with 11 points which puts them in Q1 in terms of Impetus as does their Squad Quality with wage expenditure estimated at 83% of the EPL median, and their defensive effectiveness with only six goals conceded in their first seven games. Leeds United have also made a solid if somewhat less spectacular start with 8 points and ranking in Q2 for all four survival KPIs. Both Sunderland and Leeds United are better placed at this stage of the season than all three newly promoted clubs last season when Leicester City had 6 points, Ipswich Town 4 points and Southampton 1 point. Burnley have made the poorest start of the newly promoted clubs this season with only 4 points, matching Ipswich Town’s start last season but, unlike Ipswich Town, Burnley rank Q2 in both Squad Quality and Attack. Worryingly Burnley’s defensive effectiveness which was so crucial to their promotion from the Championship has been poor so far this season and, at over two goals conceded per game, on a par with Ipswich Town, Leicester City and Southampton last season.

Table 4: Survival KPIs and Survival Probabilities, Newly Promoted Clubs in the EPL, 2025/26, After Round 7

*Calculated using estimated squad salary costs sourced from Capology (www.capology.com)

Using the survival KPIs for all 86 newly promoted clubs 1995 – 2024, a logistic regression model has been estimated for the survival probabilities of newly promoted clubs in the EPL. This model combines the four survival KPIs and weights their relative importance based on their ability to jointly identify correctly those newly promoted clubs that will survival. The model has a success rate of 82.6% predicting which newly promoted clubs will survive and which will be relegated. Based on the first seven games, Sunderland have a survival probability of 99.9%, Leeds United 72.9% and Burnley 1.6%. These figures are extreme and merely highlight that Sunderland have made an exceptional start, Leeds United a good start and Burnley have struggled defensively. It is still early days and crucially the survival probabilities do not control for the quality of the opposition. Sunderland have yet to play a team in the top five whereas Leeds United and Burnley have both played three teams in the top five. I will update these survival probabilities regularly as the season progresses. They are likely to be quite volatile in the coming weeks but should become more stable and robust by late December.

How Do Newly Promoted Clubs Survive In The EPL? Part One: What Do The Numbers Say?

The English Premier League (EPL) started its 34th season last weekend with most of the pundits focusing on the top of the table and whether Arne Slot’s Liverpool can retain the title in the face of a rejuvenated challenge by Pep Guardiola’s Manchester City. Relatively little attention has been given to the chances of the newly promoted clubs – Leeds United, Burnley and Sunderland – avoiding relegation with most pundits tipping all three to follow their predecessors in the last two seasons in being immediately relegated back to the Championship. The opening weekend of the EPL season went somewhat against the doom merchants with two of the three newly promoted clubs, Sunderland and Leeds United, winning. This is the first time that two newly promoted clubs have won their first game since Brentford and Watford in 2021/22 with the only other instance of this rare feat being Bolton Wanderers and Crystal Palace in 1997/98 although it should be noted that only Brentford then went on to avoid relegation. I must of course in the interests of objectivity declare my allegiances – I have lived and worked in Leeds for over 40 years and, as a Scot growing up in the 1960s, my “English” team was always Leeds United, then packed with Scottish internationals with Billy Bremner and Eddie Gray my particular favourites. So with Leeds United returning to the EPL after two seasons in the Championship, what are the chances that Leeds United and the other two promoted clubs can defy conventional wisdom and avoid relegation? What do the numbers say?

The Dataset

The dataset used in the analysis covers 30 years of the EPL from season 1995/96 to season 2025/26. The analysis has begun in 1995/96 which was the first season that the EPL adopted its current structure of 20 clubs with three clubs relegated. Note that there were only two teams promoted from the Championship in 1995/96. League performance has been measured by Wins, Draws, Losses, Goals For, Goals Against and League Points. In order to focus on sporting performance, League Points are calculated solely on the basis of games won and drawn, and exclude any points deductions for regulatory breaches. There is no case of any club being relegated solely because of regulatory breaches. Survival Rate is defined as the percentage of newly promoted clubs that were not relegated in their first season in the EPL. Relative Wages has been calculated as the total wage expenditure of clubs as reported in their company accounts relative to the median wage expenditure of all EPL clubs that season (indexed such that 100 = median wage expenditure). This allows comparisons to be drawn across seasons despite the underlying upward trend in wage expenditure. Company accounts are not yet available for 2024/25 so there is no analysis of wage expenditure and sporting efficiency in the most recent EPL season. Total wage expenditure includes all wage expenditure not just player wages. Estimates of individual player wages and total squad costs are available but their accuracy is unknown and limited to recent seasons only. A comparison of one such set of estimated squad wage costs and the wage expenditures reported in company accounts for the period 2014 – 2024 yielded a correlation coefficient of 0.933 which suggests that the “official” wage expenditures provide a very good proxy for player wage costs. Sporting Efficiency is defined as League Points divided by Relative Wages (and multiplied by 100). Sporting Efficiency is a standardised measured of league points per unit of wage expenditure across seasons that attempts to capture the ability of clubs to transform financial expenditure into sporting performance which, when all is said and done, is the fundamental transformation in professional team sports and at the heart of the Moneyball story as to how teams can attempt to offset limited financial resources by greater sporting efficiency.

League Performance of Newly Promoted Clubs

Table 1 summarises the average league performance of newly promoted clubs over the last 30 seasons of the EPL, broken down into 5-year sub-periods in order to detect any long-term trends over time. In addition, the proposition that the average league performance has deteriorated in the last five seasons compared to the previous 25 seasons has been formally tested statistically using a t-test with instances of strong evidence (i.e. statistical significance) of this deterioration indicated by asterisks (or a question mark when is marginally weaker). The key points to emerge are:

  1. There is no clear trend in wins, draws and losses by newly promoted clubs between 1995/96 and 2019/20 but thereafter there is strong evidence that newly promoted clubs are winning and drawing fewer games and, by implication, losing more games.
  2. Newly promoted clubs averaged 4 more losses since 2020 compared to previous seasons with an average of 22.5 losses in the last five seasons as opposed to an average of 18.7 losses in previous 25 seasons.
  3. The poorer league performance in recent seasons represents a reduction in average league points from 39.0 (1995/96 – 2019/20) to 30.5 points (2020/21 – 2024/25).
  4. Given that the acknowledged benchmark to avoid relegation is 40 points, not surprisingly the survival rate of newly promoted clubs has declined in the last five seasons to only a one-in-three chance of survival (33.3%) compared to a slightly better than one-in-two chance (56.8%) in the previous 25 seasons.
  5. The data suggests strongly that the primary reason for the decline in league performance and survival rates of newly promoted clubs in the last five seasons has been weaker defensive play, not weaker attacking play. Newly promoted clubs averaged 61.1 goals against in seasons 1995/96 – 2019/20 but this rose to 73.8 goals against in the last five seasons which represents very strong evidence of a systematic change in the defensive effectiveness of newly promoted clubs. In stark contrast, the change in goals for has been negligible with a decline from 40.5 (1995/96 – 2019/20) to 38.8 (2020/21 – 2024/25) which is more likely to be accounted for by season-to-season fluctuation rather than any underlying systematic decline in attacking effectiveness.

Wage Costs and Sporting Efficiency of Newly Promoted Clubs

It has been frequently argued that the recent decline in the league performance and survival rates of newly promoted clubs is due to an increasing gap in financial resources between established EPL clubs and the newly promoted clubs. Table 2 addresses this issue. There is absolutely no support for newly promoted clubs being more financially disadvantaged relatively compared to their predecessors. There has been virtually no change in the relative wage expenditure of newly promoted clubs in the last five seasons which has averaged 67.1 compared to 66.3 in the previous 25 seasons. The lower survival rate in recent seasons is NOT due to newly promoted clubs spending proportionately less on playing talent.

There is a very simply equation that holds by definition:

League Performance = Relative Wages X Sporting Efficiency

Since their league performance has declined but the relative wage expenditure of newly promoted clubs has stayed more or less constant, then their sporting efficiency MUST have declined. Table 2 suggests that there may have been a downward trend in the sporting efficiency in newly promoted clubs in the last 15 seasons. In addition, there is strong evidence that there has been a systematic downward shift in the sporting efficiency in the last five seasons to 51.4 compared to the previous average of 63.2 (1995/96 – 2019/20). On its own, this is merely a statement of the obvious dressed up in mathematical and statistical formalism. Newly promoted clubs are performing worse on the pitch as a result of spending less effectively. The crucial question is why league performance and sporting efficiency have declined. The answer may lie in reflecting on the fact that, as we discovered in Table 1, the reason for the poorer league performance is primarily due to poorer defensive effectiveness not poorer attacking effectiveness. Newly promoted clubs seem to be buying the same number of goals scored with the same relative wage budget as in previous seasons but at the cost of buying less defensive effectiveness and conceding more goals. This is consistent with a Moneyball-type distortion in the EPL player market with a premium paid for strikers that may not be fully warranted by current tactical developments in the game. The numbers would support newly promoted clubs giving a higher priority to defensive effectiveness in their recruitment and retention policy and avoiding spending excessively on expensive strikers, particularly those with little experience of playing and scoring in the top leagues.

Competitive Balance Part 3: North American Major Leagues

As discussed in the two previous posts on competitive balance, there is no agreed single definition of competitive balance beyond a general statement that a competitively balanced league is characterised by all teams having a relatively equal chance of winning individual games and the league championship. The lack of agreement on a specific definition of competitive balance combined with the wide variety of league structures and the statistical problems of inferring ex ante (i.e. pre-event) success probabilities from ex post (i.e. actual) league outcomes has led to a multiplicity of competitive balance metrics. Morten Kringstad and I have argued in several published journal articles and book chapters that it is useful to categorise competitive balance metrics as either measures of win dispersion or performance persistence. Win dispersion measures the dispersion in league performance across teams in a particular season. Performance persistence measures the degree to which the league performance of individual teams is replicated across seasons – do teams tend to finish in the same league position in consecutive seasons? These are two quite different aspects of competitive balance and multiple metrics have been proposed for both. However, when it comes to discussions as to what leagues should do, if anything, to maintain or improve of competitive balance, there is a general (often implicit) presumption that all competitive balance metrics tend to move in the same direction. Morten and I have sought to discover if this is indeed the case. And, as reported in my previous post on the subject, the evidence from European football is quite mixed and, at the very least, casts doubt on the general presumption that there is a strong positive relationship between win dispersion and persistence. Indeed, we found that in the period 2008 – 2017 win dispersion and performance persistence tended to move in opposite directions in the English Premier League.

            In this post, I am going to discuss the evidence from a study on win dispersion and performance persistence in the four North American Major Leagues (NAMLs) that Morten and I published recently in Sport, Business, Management: An International Journal (vol 13 no. 5, 2023). Our dataset covered the four NAMLs – MLB (baseball), NFL (American football), NBA (basketball) and NHL (ice hockey) – seven different competitive balance metrics, and 60 seasons, 1960 – 2019 (thereby avoiding the impact of the Covid pandemic). In this post I am only focusing on the ASD* measure of win dispersion, the SRCC measure of performance persistence, and the correlation between these measures to test whether or not win dispersion and performance persistence move together in the same direction. I have reported these three measures as 10-year averages in order to identify possible trends over time. It is agreed that the ASD* metric provides better comparability of win dispersion between leagues with very different lengths of game schedules in the regular season. At one extreme the MLB has a 162-game schedule whereas for most of the period the NFL had a 16-game regular season schedule (recently increased to 17 games). The ASD* uses the actual standard deviation of team win percentages relative to the theoretical standard deviation of a perfectly dominated league with the same number of teams and games in which every team loses against the teams ranked above it so the top team wins every game, the second best team only loses against the top team, the 3rd-placed team only loses against the top two and so on. (Formally, this is called a “cascade” distribution.) The SRCC measure of performance persistence is just the Spearman rank correlation coefficient of league standings in two consecutive seasons.

            One important contextual change in most leagues since the 1960s has been the move away from a very restricted player labour market in which a player’s current team had priority in retaining a player. Instead player labour markets have become a very competitive auction-type market in which players have the right to move to another team at the end of their current contract (what is known as “free agency”). The NAML’s led the way in pro team sports in introducing some form of free agency in the 1970s/80s. European leagues lagged behind until the Bosman ruling in 1995 which effectively created free agency by abolishing transfer fees for out-of-contract players. So in some ways it should be expected that the general trend in the NAMLs has been towards greater competitive imbalance as the big-market teams have taken advantage of free agency to acquire the best players. However, there has been another general tendency with leagues becoming much more interventionist by introducing regulatory mechanisms especially salary caps which, in part, has been motivated by an attempt to offset the potential negative effect on competitive balance of free agency. Which effect has been stronger? Let’s look at the numbers.

            Table 1 below reports the 10-year averages for win dispersion for the four NAMLs. Broadly speaking, the pattern in win dispersion in the NAMLs over the last 60 years has been for win dispersion to decrease from the 1960s though to the 1990s (i.e. improved competitive balance) but for win dispersion to increase since the 1990s (i.e. reduced competitive balance). Both the MLB and NFL follow this pattern, suggesting that the league intervention effect may have initially dominated the free agency effect but in recent years the resource-richer teams may have adapted to the more regulated environment and found other ways to exert their financial advantage (while remaining compliant with league regulations) such as higher expenditures on technology and data analytics. I used to argue that the Oakland A’s and the Moneyball phenomenon is an example of data analytics being used as a “David” strategy for resource-poorer teams to compete more effectively. And it is true that in the early days of sports analytics it was often the resource-poorer teams that led the way in operationalising data analytics as a source of competitive advantage. But these days most teams recognise the potential gains from analytics and some very resource-rich teams are investing heavily in data analytics.

            The trends in win dispersion are much less clear in both the NBA and NHL. There has been some underlying trend from the 1960s onwards for competitive balance to worsen in the NBA as win dispersion has increased. In contrast, the NHL has tended to experience an improvement in competitive balance with lower win dispersion since the turn of the century.

            When win dispersion across the four NAMLs are compared, there is a rather surprising result that the NFL has the highest degree of win dispersion over the whole period (i.e. low competitive balance) whereas the MLB has the lowest win dispersion (i.e. high competitive balance) with the NBA and NHL in the mid-range. I say surprising since conventional wisdom is that NFL has been one of the most proactive leagues in trying to maintain a high level of competitive balance whereas traditionally the MLB has been much less interventionist. The problem in making comparisons across leagues especially in different sports is the “apples-and-oranges” problem – trying to compare like with like. As highlighted earlier, there are massive differences between the NAMLs in the length of regular-season game schedules. I am more inclined to the view that the difference in win dispersion between the NAMLs is more a reflection of the difficulties in constructing a metric that properly controls for the length of game schedules, that is, it is more a measurement problem than a “true” reflection of differences in competitive balance.

            The argument that win dispersion metrics can pick up trends within leagues but is less reliable for comparisons across leagues is reinforced by the results for performance persistence reported below in Table 2. Performance persistence measures the degree to which the final standings of teams are replicated in consecutive seasons. The length of game schedule has a much more indirect effect on performance persistence so that comparisons across leagues should be more reliable. And, indeed, we find that from the 1980s onwards the NFL has had the lowest degree of performance persistence which fits with the conventional view that the NFL has been the most proactive league in maintaining a high degree of competitive balance. Winning NFL teams face a number of “penalties” in the next season – tougher game schedules, lower-ranked draft picks and the constraints imposed by the salary cap in retaining free agents who have increased in value by virtue of their on-the-field success. It is more and more difficult for NFL teams to become “dynasty” teams which makes the Belichick-Brady era at the New England Patriots and, most recently, the success of the Kansas City Chiefs so remarkable.

            As well as the NFL, the other NAML that has managed to reduce the degree of performance persistence is the NHL which had the highest degree of performance persistence in the 1960s and 1970s but now ranks second best behind the NFL. The MLB experienced reduced performance persistence in the 1980s and 1990s ( and had, on average, lower performance persistence than the NFL in the 1990s) but that downward trend has been reversed in the last two decades. The one major league that has had no discernible trend in performance persistence over the last 60 years and has the highest degree of performance persistence is the NBA despite instituting a salary cap albeit a rather “soft” cap with a number of exemptions. The high performance persistence of basketball teams is inherent in the very structure of the game. With only five players on court for a team at any point in time, basketball is much more susceptible to the “Michael Jordan” (i.e. “super-superstar”) effect and the soft salary cap makes it easier to retain these super-superstars.

            The final set of results reported in Table 3 show how the relationship between win dispersion and performance persistence has varied over time and between leagues. One of the main motivations for this research is to determine whether or not the general presumption of a strong positive dispersion-persistence relationship is empirically valid. The evidence is mixed. There are only eight instances of a strong positive dispersion-persistence relationship (r > 0.5) out of a possible 24 which is hardly overwhelming evidence in favour of the general presumption. If medium-sized effects are included (0.3 < r < 0.5) then only half of the reported results provide support for the general presumption of a positive relationship with three strong/medium negative results and nine showing only small/negligible effects. There is one instance of a strong negative dispersion-persistence relationship in the NHL in 2010-19 indicating that reductions in performance persistence were associated with increases in win dispersion.

Competitive balance in the NAMLs has been much researched over the last 30 years. The results of our study are broadly in line with previous results but highlight that any conclusions are likely to be time-dependent and metric-dependent. The most definitive results are those on performance persistence which show a general tendency in both the NFL and NHL for improved competitive balance despite the advent of free agency. There is also clear evidence of  continuing high levels of performance persistence in the NBA, likely to be due to the super-superstar effect inherent in the game structure of basketball. As for the general presumption that win dispersion and performance persistence tend to move together in the same direction, there is no overwhelming support that they do so in most cases. The practical implication is that leagues need to be clearer on which aspect of competitive balance is most important in driving uncertainty of outcome and spectator/viewer interest. Leagues must also recognise that the structures of their sports may limit the extent to which competitive balance can be regulated. Basketball is always likely to more susceptible to super-superstar effects that can lead to high levels of performance persistence. And leagues with short game schedules may always tend to have higher levels of win dispersion since there is more limited opportunity for winning or losing streaks to even themselves out – what statisticians call the “regression-to-the-mean” effect.

Other Related Posts

Competitive Balance Part 1: What Are The Issues?

Competitive Balance Part 2: European Football

Note: The results reported in this post are published in B. Gerrard and M. Kringstad, ‘Dispersion and persistence in the competitive balance of North American Major Leagues 1960 – 2019‘, Sport, Business, Management: An International Journal, vol. 13 no. 5 (2023), pp. 640-662.

Football, Finance and Fans in the European Big Five

Executive Summary

  •  Divergent revenue growth paths in the Big Five European football leagues since 1996 has more than doubled the inequality in the financial strength of these leagues.
  • The financial dominance of the EPL is based on growing gate attendances, increasing value of media rights and high marketing efficiency.
  • The financial dominance of the EPL puts it at a massive advantage in attracting the best sporting talent.
  • The pandemic highlighted the precarious financial position of the French and Italian leagues due to high wage-revenue ratios and consequent operating losses
  • The financial regulation of the Bundesliga clubs put them in a much stronger position to cope with loss of revenues during the pandemic.

The top tiers of the domestic football leagues in England, France, Germany, Italy and Spain constitute the so-called “Big Five” of European football in financial terms as measured by the total revenues of their member clubs. Figure 1 shows the growth in revenues in the Big Five since 1996. The most striking feature of this timeplot is the divergent growth paths of the Big Five. From a starting point of relative parity in 1996 the divergent growth paths of the Big Five call into question whether it is even appropriate to still talk in terms of the Big Five. Using the coefficient of variation (CoV) as a measure of relative dispersion (effectively CoV is just a standardised standard deviation with the scale effect removed), the degree of dispersion between the revenues of the Big Five has more than doubled from 0.244 in 1996 to 0.509 in 2022. The English Premier League (EPL) is quite literally in a league of its own in financial terms with total revenues of €6.4bn in 2022. The rest of the Big Five lag a long way behind with the Spanish La Liga and German Bundesliga grossing revenues of €3.3bn and €3.1bn, respectively in 2022 and the Italian Serie A and French Ligue 1 lagging another €1bn or so behind with revenues of €2.4bn and €2.0bn, respectively. And with the expected uplift in the EPL’s next media rights deal and the continued growth in gate attendances, the gap between the EPL and the rest of the Big Five looks set to increase further.

Figure 1: Revenues (m), European Big Five, 1996 – 2022

Another key feature of Figure 1 is the impact of the Covid pandemic on league revenues. The biggest losers in 2020 were the EPL clubs with the postponement of the last part of the 2019/20 leading to an overall loss of revenue of around €0.7bn. But although the whole of the 2020/21 season was played behind closed doors wiping out matchday revenues, media revenues increased with all games shown live. By 2022 with the return of spectators to football grounds and continued growth in media revenues, the EPL was back on its pre-pandemic trend with revenues over 10% higher than in 2019 prior to the pandemic. In contrast, of the other Big Five, only the French Ligue 1 had increased revenues in 2022 above the pre-pandemic level.

In assessing the revenue performance of football leagues/clubs, apart from revenue growth rates, there are two very useful revenue KPIs (Key Performance Indicators):

Media% = media revenues as a % of total revenues; and

Local Spend = non-media revenues per capita (using average league gate attendances as the size measure to standardise club/league revenues)

Media% shows the dependency of the league and its clubs on the value of their media rights. Local Spend is a measure of the marketing efficiency of clubs in generating matchday and commercial revenues relative to the size of their active fanbase as measured by average league gate attendance. As can be seen in Table 1 which reports these two revenue KPIs for 2019, 2021 and 2022, all the Big Five became much more dependent on media revenues during the Covid years as seen in the increased Media% in 2021. As would be expected Local Spend fell sharply in the Covid years with the loss of matchday revenues. What is more concerning in the longer term for the rest of the Big Five is that the financial strength of the EPL is based not only on the much higher value of their media rights but also the stronger capability of EPL clubs to generate matchday revenues and commercial revenues. Prior to the pandemic only the Spanish La Liga got close to the EPL in terms of Local Spend but by 2022 the EPL had a substantial lead over all of the other Big Five in Local Spend. Given as noted earlier, the underlying upward trends in gate attendances and the value of media rights in the EPL, when you also allow for the marketing efficiency advantage as measured by Local Spend, the financial dominance of the EPL seems likely to grow unabated in the coming years.

Table 1: Revenue KPIs, European Big Five, Selected Years

LeagueMedia%Local Spend (€)
201920212022201920212022
England59.12%68.66%54.14%3,1312,1893,732
France47.37%51.80%35.98%2,1921,7272,879
Germany44.33%55.21%43.82%2,1431,6462,164
Italy58.52%69.92%56.94%2,0491,3831,842
Spain54.25%67.74%58.53%2,8711,6472,354

 The financial strength of the EPL allows their clubs to offer lucrative salaries and pay high transfer fees to attract the best players in the global football players’ labour market. As can be seen in Figure 2, the divergent revenue growth paths of the Big Five in Figure 1 are replicated in similar divergent wage growth paths. Effectively, the €3bn revenue advantage of the EPL in 2022 allowed EPL clubs to spend €2bn more on wage costs than the German Bundesliga, the next biggest spenders in the Big Five. And it is not just the best players that can be attracted to the EPL, it is also the best coaching and support staff. The danger of financial dominance in pro team sports is that it can lead to sporting dominance and this, in turn, can undermine the sustainability of the league as teams with less financial power seek to remain competitive by overspending on wages, leading to operating losses and increasing levels of debt.

Figure 2: Wage Costs (m), European Big Five, 1996 – 2022

 

The danger of overspending on wage costs relative to revenues can be seen very clearly in the wage-revenue ratio, possibly the most important financial performance ratio in pro team sports. By far the most dominant cost in any people business such as sport and entertainment is wages. If wage costs are too high relative to revenues, teams will make operating losses and will require to be either deficit-financed by their owners or debt-financed with all of the attendant risks. As can be seen in Figure 3, the wage-revenue ratios have tended to be highest in the French and Italian leagues, the smallest financially of the Big Five leagues. Indeed in the early 2000s the Italian Serie A got close to spending all of its revenue on wages, with the French Ligue 1 nearly emulating this during the Covid years.

Figure 3: Wage-Revenue Ratios, European Big Five, 1996 – 2022

Table 2 shows the danger of the financially smaller leagues having higher wage-revenue ratios. They can be put in a very precarious position if there is a sudden loss of revenues as happened during the pandemic (but could also happen if there is a loss in the value of a league’s media rights). Wage costs are largely fixed at any point in time through contractual commitments so any reduction in revenues is likely to lead to higher wage-revenue ratios and operating losses. As a benchmark, financial prudence would normally dictate wage-revenue under 65% in order to make operating profits. The French and Italian leagues operated with wage-revenue ratios above 70% prior to the pandemic and both remained above 80% in 2022. The Spanish La Liga was on a par with the EPL in 2019 at just over 60%. Both leagues saw their wage-revenue ratio rise above 70% in 2021 but, whereas the EPL fell back below 67% in 2022, La Liga remained high above 70%.

Table 2: Wage-Revenue Ratio, European Big Five, Selected Years

LeagueWage-Revenue Ratio
201920212022
England61.17%71.05%66.84%
France73.03%98.27%86.87%
Germany53.75%64.96%59.13%
Italy70.42%82.98%82.98%
Spain62.04%74.19%72.66%

In footballing terms, the bastion of football prudence has been the German Bundesliga with its longstanding financial management regime requiring clubs to submit budgets for approval as a condition of their league membership. As seen in both Figure 3 and Table 2, the Bundesliga has historically operated with wage-revenue ratios between 45% and 55%. Even with the loss of revenue during the Covid years, the wage-revenue ratio only hit 65% and fell back below 60% in 2022. The effectiveness of the German approach can be seen in Table 3 which reports the marginal wage-revenue ratio (MWRR) over the last 27 years. What this ratio shows is the proportion on average spent on wages of every increment of €1m of revenue over the last 27 years as each league has grown financially. The EPL has had a MWRR of 65.0% with the Spanish La Liga operating in a very similar way with a MWRR of 67.7%. The Bundesliga has had a MWRR of 56.5%. Given that the Spanish and German leagues are of a similar size in revenue terms, it suggests that long term the Germen financial management regime has lowered their wage-revenue ratio by 11% compared to what it would have been with a lighter touch. The very high MWRRs of the French and Italian leagues coupled with their lower revenue growth rates further reinforce the concerns over their financial future.

Table 3: Marginal Wage-Revenue Ratio, European Big Five, 1996 – 2022

LeagueMarginal Wage-Revenue Ratio 1996 – 2022
England65.03%
France83.21%
Germany56.60%
Italy79.31%
Spain67.73%

Notes:

  1. The raw financial data for the analysis has been sourced from various editions of Deloitte’s Annual Review of Football Finance (Annual Review of Football Finance 2023 | Deloitte Global)
  2. Throughout the years refer to financial year-end. Hence, for example, the figures reported for 1996 refer to season 1995/96.
  3. The base year of 1996 has been used since 1995/96 was the first season when the EPL adopted its current 20-club, 380-game format.
  4. Average league gates for season 2019/20 have been used to calculate Local Spend during the Covid years when games were played behind closed doors with no spectators in the stadia.

Financial Determinism and the Shooting-Star Phenomenon in the English Premier League

Executive Summary

  • Financial determinism in professional team sports refers to those leagues in which sporting performance is largely determined by expenditure on playing talent
  • Financial determinism creates the “shooting-star” phenomenon – a small group of ”stars”, big-market teams with the high wage costs and high sporting performance, and a large “tail” of smaller-market teams with lower wage costs and lower sporting performance
  • There is a very high degree of financial determinism in the English Premier League
  • Achieving high sporting efficiency is critical for small-market teams with limited wage budgets seeking to avoid relegation

Financial determinism in professional team sports refers to those leagues in which sporting performance is largely determined by expenditure on playing talent. It is the sporting “law of gravity”. Financial determinism implies a strong win-wage relationship with league outcomes highly correlated with wage costs so that those teams with the biggest markets and the greatest economic power (i.e. the biggest “wallets”) to be able to afford the best players tend to win. Financial determinism creates what can be called the “shooting-star” phenomenon shown in Figure 1. The “stars” are the sporting elite in any league, the big-market teams with the high wage costs and high sporting performance. The rest of the league constitutes the “tail”, the smaller-market teams with lower wage costs and lower sporting performance. Some small-market teams can temporarily defy the law of gravity by achieving high sporting efficiency. The classic example of this is the Moneyball story in Major League Baseball where the Oakland Athletics used data analytics to identify undervalued playing talent. And, of course, there are the bigger market teams who spend big but do so inefficiently and perform well below expectation.

Figure 1: The Shooting-Star Phenomenon

A fundamental proposition in sports economics is that uncertainty of outcome is a necessary condition for viable professional sports leagues. This is the notion that the essential characteristic of sport is the excitement of unscripted drama where the outcome is determined by the contest and is not scripted in advance. Uncertainty of outcome requires that teams in any league are relatively equally matched in their economic power with similar revenues and similar access to financial capital. Unequal distribution of economic power across teams leads to financial determinism. The most common causes of disparities in economic power between teams are location (i.e. teams based in large metropolitan areas often have much bigger fanbases and, consequently, can generate much higher revenues) and ownership wealth (i.e. teams with rich owners who are driven by sporting glory rather than profit and will spend whatever it takes to win). To prevent financial determinism, leagues have used a number of regulatory mechanisms to maintain competitive balance including revenue sharing, salary caps and player drafts.

Is the English Premier League subject to financial determinism and the shooting-star phenomenon? To answer this question I have tracked wage costs reported in club accounts from 1995/96 onwards when the English Premier League adopted its current structure of 20 teams and 380 games with three teams relegated. Clubs are still in the process of reporting their 2023 accounts so that the analysis concludes with season 2021/22. Since the analysis covers 27 seasons, wage costs need to be standardised to allow for wage inflation. I have used average wage costs each season to deflate wage costs to 1995/96 levels.  Very roughly, £10m wage costs in 1996/97 equates to £200m wage costs in 2021/22. Sporting performance has been measured by league points based on match outcomes; any point deductions for breach of league regulations have been excluded. (Middlesbrough were deducted 3 points in 1996/97 for failing to fulfil a scheduled fixture and Portsmouth were deducted 9 points in 2009/10 for going into administration.) Figure 2 shows the scatterplot of league points and standardised wage costs. The two groupings, the big-spending stars and the lower-spending tail, are very obvious. The tail is very dense and contains most of the observations (73.9% of the clubs had standardised wage costs under £10m). The stars are fewer in number and more dispersed with 10 instances of clubs having standardised wage costs in excess of £20m (which equates to over £400m in 2021/22). The correlation between standardised wage costs and league points is 0.793 which implies that over the 27 seasons, 62.8% of the variation in league performance can be explained by the variation in wage costs. In other words, there is a very high degree of financial determinism in the English Premier League.

Figure 2: The Shooting-Star Phenomenon in the English Premier League

Season 2021/22 is very typical as regards the degree of financial determinism in the English Premier League as shown in Figure 3. The correlation between wage costs and league points is 0.793 which implies that 61.2% of the variation in league performance can be explained by the variation in wage costs. The linear trendline acts as a performance benchmark – the average efficient outcome for any given level of wage costs – and thus identifies above-average efficient (“above the line”) outcomes and below-average efficient, “below the line” outcomes. At the top end, Manchester City, the champions with 93 points, a single point ahead of Liverpool, were outspent by both Manchester United and Liverpool. Manchester United were highly inefficient gaining only 58 points but with wage costs of £408m. By comparison, West Ham United gained 56 points with wage costs of £136m.

Figure 3: Win-Wage Relationship in English Premier League, 2021/22

As regards relegation, all three relegated teams – Norwich City, Watford and Burnley – lie below the average-efficiency line. In the cases of both Burnley and Watford their final league positions matched their wage rank  – their sporting efficiency was not good enough to offset their resource disadvantage. In contrast, Norwich City allocated enough resource to avoid relegation – their wage costs of £117m ranked 15th – but they were highly inefficient. Of the lower spending teams, the two most efficient teams were Brentford and Brighton and Hove Albion who both finished safely in mid-table but ranked 20th and 16th, respectively, in wage costs. In a future post, I will analyse the determinants of sporting efficiency in more detail.

Read other Related Posts

Measuring Trend Growth

Executive Summary

  • The most useful summary statistic for a trended variable is the average growth rate
  • But there are several different methods for calculating average growth rates that can often generate very different results depending on whether all the data is used or just the start and end points, and whether simple or compound growth is assumed
  • Be careful of calculating average growth rates using only the start and end points of trended variables since this implicitly assumes that these two points are representative of the dynamic path of the trended variable and may give a very biased estimate of the underlying growth rate
  • Best practice is to use all of the available data to estimate a loglinear trendline which allows for compound growth and avoids having to calculate an appropriate midpoint of a linear trendline to convert the estimated slope into  growth rate

When providing summary statistics for trended time-series data, the mean makes no sense as a measure of the point of central tendency. By definition, there is no point of central tendency in trended data. Trended data are either increasing or decreasing in which case the most useful summary statistic is the average rate of growth/decline. But how do you calculate the average growth rate? In this post I want to discuss the pros and cons of the different ways of calculating the average growth rate, using total league attendances in English football (the subject of my previous post) as an illustration.

              There are at least five different methods of calculating the average growth rate:

  1. “Averaged” growth rate: use gt = (yt – yt-1)/yt-1 to calculate the growth rate for each period then average these growth rates
  2. Simple growth rate: use the start and end values of the trended variable to calculate the simple growth rate with the trended variable modelled as yt+n = yt(1 + ng)
  3. Compound growth rate: use the start and end values of the trended variable to calculate the compound growth rate with the trended variable modelled as yt+n = yt(1 + g)n
  4. Linear trendline: estimate the line of best fit for yt = a + gt (i.e. simple growth)
  5. Loglinear trendline: estimate the line of best fit for ln yt = a + gt (i.e. compound growth)

where y = the trended variable; g  = growth rate; t = time period; n = number of time periods; a = intercept in line of best fit

These methods differ in two ways. First, they differ as to whether the trend is modelled as simple growth (Methods 2, 4) or compound growth (Methods 3, 5). Method 1 is effectively neutral in this respect. Second, the methods differ in terms of whether they use only the start and end points of the trended variable (Methods 2, 3) or use all of the available data (Methods 1, 4, 5). The problem with only using the start and end points is that there is an implicit assumption that these are representative of the underlying trend with relatively little “noise”. But this is not always the case and there is a real possibility of these methods biasing the average growth rate upwards or downwards as illustrated by the following analysis of the trends in football league attendances in England since the end of the Second World War.

Figure 1: Total League Attendances (Regular Season), England, 1946/47-2022/23

This U-shaped timeplot of total league attendances in England since the end of the Second World War splits into two distinct sub-periods of decline/growth:

  • Postwar decline: 1948/49 – 1985/86
  • Current revival: 1985/86 – 2022/23

Applying the five methods to calculate the average annual growth rate of these two sub-periods yields the following results:

MethodPostwar Decline 1948/49 – 1985/86Current Revival 1985/86 – 2022/23*
Method 1: “averaged” growth rate-2.36%2.28%
Method 2: simple growth rate-1.62%3.00%
Method 3: compound growth-2.45%2.04%
Method 4: linear trendline-1.89%1.75%
Method 5: loglinear trendline-1.95%1.85%
*The Covid-affected seasons 2019/20 and 2020/21 have been excluded from the calculations of the average growth rate.

What the results show very clearly is the wide variability in the estimates of average annual growth rates depending on the method of calculation. The average annual rate of decline in league attendances between 1949 and 1986 varies between -1.62% (Method 2 – simple growth rate) to -2.45% (Method 3 – compound growth rate). Similarly the average annual rate of growth from 1986 onwards ranges from 1.75% (Method 4 – linear trendline) to 3.00% (Method 2 – simple growth rate). To investigate exactly why the two alternative methods for calculating the simple growth rate during the Current Revival give such different results, the linear trendline for 1985/86 – 2022/23 is shown graphically in Figure 2.

Figure 2: Linear Trendline, Total League Attendances, England, 1985/86 – 2022/23

As can be seen, the linear trendline has a high goodness of fit (R2 = 93.1%) and the fitted endpoint is very close to the actual gate attendance of 34.8 million in 2022/23. However, there is a relatively large divergence at the start of the period with the fitted trendline having a value of 18.2 million whereas the actual gate attendance in 1985/86 was 16.5 million. It is this divergence that accounts in part for the very different estimates of average annual growth rate generated by the two methods despite both assuming a simple growth rate model. (The rest of the divergence is due to the use of midpoint to convert the slope of the trendline into a growth rate.)

              So which method should be used? My advice is to be very wary of calculating average growth rates using only the start and end points of trended variables. You are implicitly assuming that these two points are representative of the dynamic path of the trended variable and may give a very biased estimate of the underlying growth rate. My preference is always to use all of the available data to estimate a loglinear trendline which allows for compound growth and avoids having to calculate an appropriate midpoint of a linear trendline to convert the estimated slope into a growth rate.

Read Other Related Posts

League Gate Attendances in English Football: A Historical Perspective

Executive Summary

  • The historical trends in league gate attendances in English football can be powerfully summarised visually using timeplots
  • Total league attendances peaked in 1948/49 and thereafter declined until the mid-1980s
  • League attendances across the Premier League and Football League have recovered dramatically since the mid-1980s and are now at levels last experienced in the 1950s
  • Using average gates to allow for changes in the number of clubs and matches, the  Premiership matches in 2022/23 averaged 40,229 spectators per match, the highest average gate in the top division since the formation of the Football League in 1888 

How popular are the top four tiers of English league football as a spectator sport from a historical perspective? That’s the question that I want to address in this post using timeplots to visualise the historical trends in gate attendances. I have compiled a dataset with total league attendances for every season since the Football League began in 1888. To ensure as much comparability as possible, I have included only regular-season matches and excluded post-season play-off matches. (A historical footnote – post-season playoffs to decide promotion/relegation are not a modern innovation. There were playoffs called “test matches” in the early years of the Football League after the creation of the Second Division in 1892 but these were abandoned in 1898 and replaced by automatic promotion and relegation following  a scandal when Stoke City and Burnley played out a convenient goalless draw that ensured both would be promoted.)

Total league attendances for the top four divisions are plotted in Figure 1 with three breaks: 1915/16 – 1918/19 due to the First World War, 1939/40 – 1945/46 due to the Second World War and 2020/21 due to the Covid pandemic when all matches were played behind closed doors. In addition, total attendances dropped sharply in 2019/20 due to the final part of the season being postponed and the matches eventually played behind closed doors in the case of the Premier League and Championship, and cancelled entirely in League One and League Two.

Figure 1: Total League Attendances (Regular Season), England, 1888-2023

The Football League started in 1888 with a single division of 12 clubs. Preston North End were the original “Invincibles”, completing the League and FA Cup “Double” unbeaten in the inaugural season. A second division was formed in 1892 and membership of the Football League gradually expanded so that by the outbreak of the First World War in 1914 there were 40 member clubs split equally into two divisions with automatic promotion and relegation between the two divisions. Gate attendances peaked at 12.5 million in the 1913/14 season. The Football League expanded rapidly in the years immediately after the First World War with the incorporation of the Southern League as Division 3 in 1920 and the creation of a Division 3 (North) and Division 3 (South) the following years which increased the membership to 88 clubs by 1923. Total gate attendances reached 27.9 million in season 1937/38.

Gate attendances sharply increased after the Second World War, reaching a record 41.3 million in season 1948/49 which equated to around one million fans attending Football League matches on Saturday afternoons. Although the Football League expanded its membership to its current level of 92 clubs in 1950 and reorganised the two regionalised divisions into Division 3 and Division 4 in 1958, a long-term decline in attendances had set in with attendances falling steadily from the 1950s until the mid-1980s with the exception of a brief reversal of fortune in the late 1960s attributed to a renewed love of the beautiful game after England’s 1966 World Cup victory. The decline bottomed out in 1985/86 when Football League attendances fell to only 16.5 million which represented a 60.0% decrease from the peak in 1948/49. Thereafter the story has been one of continued growth, accelerated in part by the declaration of independence of the top division in 1992 with the formation of the FA Premier League. By last season (2022/23), league attendances in the top four tiers of English football had reached 34.8 million, a level last attained in season 1954/55 – quite an incredible turnaround.

The U-shaped pattern in total league attendances since the end of the Second World War is also evident but less clearly so if we focus only on the top division (see Figure 2). In particular, the post-1966 World Cup effect is much more noticeable with attendances rising from 12.5 million in 1965/66 to 15.3 million in 1967/68 and remaining above 14 million until 1973/74, and thereafter declining to a low of 7.8 million in 1988/89. Interestingly, given that league attendances in the top division account for 40% – 50% of total attendances for the top four divisions, it is somewhat anomalous that the recovery in attendances in the top division seems to have lagged around three years behind the rest of the Football League. However, part of the explanation is the changes in the number of clubs in the top division during that period. There were 22 clubs in the top division from 1919/20 to 1986/87 but this was reduced to 21 clubs in 1987/88 and 20 clubs in 1988/89 before returning to 22 clubs in 1991/92 with the current divisional structure of a 20-club Premier League and three 24-club divisions in the Football League dating from 1995.

Figure 2: League Attendances, Top Division, England, 1946-2023

Given the variations in the number of matches with spectators in the top division across time due to the changes in the number of clubs as well as the effects of the pandemic on total attendances in the 2019/20 season, it is more useful to compare average league gates (see Figure 3). The average gate at top division matches peaked at 38,776 in 1948/49 and declined to a low of 18,856 in 1983/84 (which leads the nadir of total Football League attendances by two years). The rapid growth in Premier League attendances occurred between 1993 and 2003 with the average gate of 21,125 in 1992/93, the first season of the Premier League, increasing by 67.8% over the next 10 years to an average gate of 35,445 in 2002/03. Growth has continued thereafter so that the average gate in the Premier League reached 40,229 in 2022/23, an historical high since the formation of the Football League and 3.7% above the previous record average gate set in 1948/49.

So to answer the question I posed at the start of the post – the top tier of English league football has never been more popular as measured by gate attendances on a per match basis, and the rest of the Football League has a level of popularity not experienced since the 1950s. England has rediscovered its love of the beautiful game since the mid-1980s and not just Premiership football. And that is before considering the explosive growth in TV coverage of English league football both domestically and internationally. But that, as they say, is another ball game entirely.

Figure 3: Average Gate, Top Division, England, 1946-2023

The Reep Fallacy

Executive Summary

  • Charles Reep was the pioneer of soccer analytics, using statistical analysis to support the effectiveness of the long-ball game
  • Reep’s principal finding was that most goals are scored from passing sequences with fewer than five passes
  • Hughes and Franks have shown that Reep’s interpretation of the relationship between the length of passing sequences and goals scored is flawed – the “Reep fallacy” of analysing only successful outcomes
  • Reep’s legacy for soccer analytics is mixed; partly negative because of its association with a formulaic approach to tactics but also positive legacy in developing a notational system, demonstrating the possibilities for statistical analysis football and having a significant impact on practitioners

There have been long-standing “artisan-vs-artist” debates over how the “the beautiful game” (i.e. football/soccer) should be played. In his history of tactics in football, Wilson (Inverting the Pyramid, 2008) characterised tactical debates as involving two interlinked tensions – aesthetics vs results and technique vs physique. Tactical debates in football have often focused on the relative merits of direct play and possession play. And the early developments in soccer analytics pioneered by Charles Reep were closely aligned with support for direct play (i.e. “the long-ball game”).

Charles Reep (1904 – 2002) trained as an accountant and joined the RAF, reaching the rank of Wing Commander. He said that his interest in football tactics began after attending a talk in 1933 by Arsenal’s captain, Charlie Jones. Reep developed his own notational system for football in the early 1950s. His first direct involvement with a football club was as part-time advisor to Brentford in spring 1951, helping them to avoid relegation from Division 1. (And, of course, these days Brentford are still pioneering the use of data analytics to thrive in the English Premier League on a relatively small budget.) Reep’s key finding was that most goals are scored from fewer than three passes. His work subsequently attracted the interest of Stan Cullis, manager in the 1950s of a very successful Wolves team. Reep published a paper (jointly authored with Benjamin) on the statistical analysis of passing and goals scored in 1968. He analysed nearly 2,500 games during his lifetime.

In their 1968 paper, Reep and Benjamin analysed 578 matches, mainly in Football League Division 1 and World Cup Finals between 1953 and 1967. They reported five key findings:

  • 91.5% of passing sequences have 3 completed passes or less
  • 50% of goals come from moves starting in the shooting area
  • 50% of shooting-area origin attacks come from regained possessions
  • 50% of goals conceded come from own-half breakdowns
  • On average, one goal is scored for every 10 shots at goal

Reep published another paper in 1971 on the relationship between shots, goals and passing sequences that excluded shots and goals that were not generated from a passing sequence. These results confirmed his earlier analysis with passing sequences of 1 – 4 passes accounted for 87.6% of shots and 87.0% of goals scored. The tactical implications of Reep’s analysis seemed very clear – direct play with few passes is the most efficient way of scoring goals. Reep’s analysis was very influential. It was taken up by Charles Hughes, FA Director of Coaching and Education, who later conducted similar data analysis to that of Reep with similar results (but never acknowledged his intellectual debt to Reep). On the basis of his analysis, Hughes advocated sustained direct play to create an increased number of shooting opportunities.

Reep’s analysis was re-examined by two leading professors of performance analysis, Mike Hughes and Ian Franks, in a paper published in 2005. Hughes and Franks analysed 116 matches from the 1990 and 1994 World Cup Finals. They accepted Reep’s findings that around 80% of goals scored result from passing sequences of three passes or less. However, they disagreed with Reep’s interpretation of this empirical regularity as support for the efficacy of a direct style of play. They argued that it is important to take account of the frequency of different lengths of passing sequences as well as the frequency of goals scored from different lengths of passing sequences. Quite simply, since most passing sequences have fewer than five passes, it is no surprise that most goals are scored from passing sequences with fewer than five passes. I call this the “Reep fallacy” of only considering successful outcomes and ignoring unsuccessful outcomes. It is surprising how often in different walks of life people commit a similar fallacy by drawing conclusions from evidence of successful outcomes while ignoring the evidence of unsuccessful outcomes. Common sense should tell us that there is a real possibility of biased conclusions when you consider only biased evidence. Indeed Hughes and Franks found a tendency for scoring rates to increase as passing sequences get longer with the highest scoring rate (measured as goals per 1,000 possessions) occurring in passing sequences with six passes. Hughes and Franks also found that longer passing sequences (i.e. possession play) tend to produce more shots at goal but conversion rates (shots-goals ratio) are better for shorter passing sequences (i.e. direct play). However, the more successful teams are better able to retain possession with more longer passing sequences and better-than-average conversion rates.

Reep remains a controversial figure in tactical analysis because of his advocacy of long-ball tactics. His interpretation of the relationship between the length of passing sequences and goals scored has been shown to be flawed, what I call the Reep fallacy of analysing only successful outcomes. Reep’s legacy to sports analytics is partly negative because of its association with a very formulaic approach to tactics. But Reep’s legacy is also positive. He was the first to develop a notational system for football and to demonstrate the possibilities for statistical analysis in football. And, crucially, Reep showed how analytics could be successfully employed by teams to improve sporting performance.