The Reep Fallacy

Executive Summary

  • Charles Reep was the pioneer of soccer analytics, using statistical analysis to support the effectiveness of the long-ball game
  • Reep’s principal finding was that most goals are scored from passing sequences with fewer than five passes
  • Hughes and Franks have shown that Reep’s interpretation of the relationship between the length of passing sequences and goals scored is flawed – the “Reep fallacy” of analysing only successful outcomes
  • Reep’s legacy for soccer analytics is mixed; partly negative because of its association with a formulaic approach to tactics but also positive legacy in developing a notational system, demonstrating the possibilities for statistical analysis football and having a significant impact on practitioners

There have been long-standing “artisan-vs-artist” debates over how the “the beautiful game” (i.e. football/soccer) should be played. In his history of tactics in football, Wilson (Inverting the Pyramid, 2008) characterised tactical debates as involving two interlinked tensions – aesthetics vs results and technique vs physique. Tactical debates in football have often focused on the relative merits of direct play and possession play. And the early developments in soccer analytics pioneered by Charles Reep were closely aligned with support for direct play (i.e. “the long-ball game”).

Charles Reep (1904 – 2002) trained as an accountant and joined the RAF, reaching the rank of Wing Commander. He said that his interest in football tactics began after attending a talk in 1933 by Arsenal’s captain, Charlie Jones. Reep developed his own notational system for football in the early 1950s. His first direct involvement with a football club was as part-time advisor to Brentford in spring 1951, helping them to avoid relegation from Division 1. (And, of course, these days Brentford are still pioneering the use of data analytics to thrive in the English Premier League on a relatively small budget.) Reep’s key finding was that most goals are scored from fewer than three passes. His work subsequently attracted the interest of Stan Cullis, manager in the 1950s of a very successful Wolves team. Reep published a paper (jointly authored with Benjamin) on the statistical analysis of passing and goals scored in 1968. He analysed nearly 2,500 games during his lifetime.

In their 1968 paper, Reep and Benjamin analysed 578 matches, mainly in Football League Division 1 and World Cup Finals between 1953 and 1967. They reported five key findings:

  • 91.5% of passing sequences have 3 completed passes or less
  • 50% of goals come from moves starting in the shooting area
  • 50% of shooting-area origin attacks come from regained possessions
  • 50% of goals conceded come from own-half breakdowns
  • On average, one goal is scored for every 10 shots at goal

Reep published another paper in 1971 on the relationship between shots, goals and passing sequences that excluded shots and goals that were not generated from a passing sequence. These results confirmed his earlier analysis with passing sequences of 1 – 4 passes accounted for 87.6% of shots and 87.0% of goals scored. The tactical implications of Reep’s analysis seemed very clear – direct play with few passes is the most efficient way of scoring goals. Reep’s analysis was very influential. It was taken up by Charles Hughes, FA Director of Coaching and Education, who later conducted similar data analysis to that of Reep with similar results (but never acknowledged his intellectual debt to Reep). On the basis of his analysis, Hughes advocated sustained direct play to create an increased number of shooting opportunities.

Reep’s analysis was re-examined by two leading professors of performance analysis, Mike Hughes and Ian Franks, in a paper published in 2005. Hughes and Franks analysed 116 matches from the 1990 and 1994 World Cup Finals. They accepted Reep’s findings that around 80% of goals scored result from passing sequences of three passes or less. However, they disagreed with Reep’s interpretation of this empirical regularity as support for the efficacy of a direct style of play. They argued that it is important to take account of the frequency of different lengths of passing sequences as well as the frequency of goals scored from different lengths of passing sequences. Quite simply, since most passing sequences have fewer than five passes, it is no surprise that most goals are scored from passing sequences with fewer than five passes. I call this the “Reep fallacy” of only considering successful outcomes and ignoring unsuccessful outcomes. It is surprising how often in different walks of life people commit a similar fallacy by drawing conclusions from evidence of successful outcomes while ignoring the evidence of unsuccessful outcomes. Common sense should tell us that there is a real possibility of biased conclusions when you consider only biased evidence. Indeed Hughes and Franks found a tendency for scoring rates to increase as passing sequences get longer with the highest scoring rate (measured as goals per 1,000 possessions) occurring in passing sequences with six passes. Hughes and Franks also found that longer passing sequences (i.e. possession play) tend to produce more shots at goal but conversion rates (shots-goals ratio) are better for shorter passing sequences (i.e. direct play). However, the more successful teams are better able to retain possession with more longer passing sequences and better-than-average conversion rates.

Reep remains a controversial figure in tactical analysis because of his advocacy of long-ball tactics. His interpretation of the relationship between the length of passing sequences and goals scored has been shown to be flawed, what I call the Reep fallacy of analysing only successful outcomes. Reep’s legacy to sports analytics is partly negative because of its association with a very formulaic approach to tactics. But Reep’s legacy is also positive. He was the first to develop a notational system for football and to demonstrate the possibilities for statistical analysis in football. And, crucially, Reep showed how analytics could be successfully employed by teams to improve sporting performance.

What Can Football and Rugby Coaches Learn From Chess Grandmasters?

Executive Summary

  • Set plays in invasion-territorial team sports can be developed and practiced in advance as part of the team’s playbook and put the onus on the coach to decide the best play in any given match context
  • Continuous open play with multiple transitions between attack and defence puts the onus on the players to make instant ball play and positioning decisions
  • The 10-year/10,000-hours rule to become an expert has been very influential in planning the long-term development of players and derives ultimately from the understanding of the perception skills of chess grandmasters
  • Chess grandmasters acquire their expertise in practical problem-solving by spending thousands of hours studying actual match positions and evaluating the moves made
  • Improved decision-making should be a key learning outcome in all training sessions involving open play under match conditions

Player development in football, rugby and the other invasion-territorial team sports is a complex process. Expertise in these types of sports is very multi-dimensional so that increasingly coaches are moving away from a concentration on just technical skills and fitness to embrace a more holistic approach. The English FA advocates the Four-Corner Model (Technical, Physical, Psychological and Social) as a general framework for guiding the development pathway of all players regardless of age or ability. I prefer to think in terms of the four A’s – Ability, Athleticism, Attitude and Awareness – in order to highlight the importance of decision making i.e. awareness of the “right” thing to do in any given match situation. My basic question is whether or not coaches in football and rugby put enough emphasis on the development of the decision-making skills of players.

Players have to make a myriad of instant decisions in a match, particularly in those invasion-territorial team sports characterised by continuous open play. At one extreme is American football which is effectively a sequence of one-phase set plays that can be choreographed in advance and mostly puts the onus for in-game decision-making on the coaches not the players. The coach writes a detailed script and players have to learn their lines exactly with little room for improvisation. By contrast (association) football is at the opposite end of the spectrum with few set plays and mostly open play with continuous transition between attack and defence; in other words, continuous improvisation. Rugby union has more scope for choreographed set plays at lineouts and scrums but thereafter the game transitions into multi-phase open play. Continuous open play puts the onus firmly on players rather than coaches for in-game decision-making. Players must continuously decide on their optimal positioning as well as making instant decisions on what to do with the ball when they are in possession. This demands ultra-fast expert problem-solving abilities to make the right choice based on an acute sense of spatial awareness.

How can football and rugby coaches facilitate the development of ultra-fast expert problem-solving abilities? One possible source of guidance is chess, an area of complex problem-solving that has been researched extensively and has thrown up important and sometimes surprising insights into the nature of expertise. The traditional view has been that grandmasters in chess are extraordinarily gifted calculators with almost computer-like abilities to very quickly consider the possible outcomes of alternative moves, able to project the likely consequences many moves ahead. But, starting with the pioneering research in the 1950s/60s of, amongst others, De Groot and Herbert Simon, a psychologist who won the Nobel Prize for Economics, we now have a very different view of what makes a grandmaster such an effective problem solver. Four key points have emerged from the research on perception in chess:

  1. Chess grandmasters do not undertake more calculations than novices and intermediate-ability players. If anything grandmasters make fewer calculations but yet are much more able to intuitively select the right move.
  2. The source of expertise of chess grandmasters and masters lies in their ability to recognise patterns in games and to associate a specific pattern with an optimal move. Both De Groot and Simon tested the abilities of chess players of different standards to recall board positions after a very brief viewing. In the case of mid-game positions from actual games with 24 – 26 pieces on the board, masters were able to correctly recall around 16 pieces on their first attempt whereas intermediate-ability players averaged only eight pieces and novices just four pieces. Yet when confronted with 24 – 26 pieces randomly located on the board, there was virtually no difference in the recall abilities between players of different playing abilities with all players averaging only around four pieces correctly remembered. There is a logic to the positioning of pieces in actual games which expert players can appreciate and exploit in retrieving similar patterns from games stored in their long-term memory and identifying the best move. This competitive advantage disappears when pieces are located randomly and, by definition, can never have any relevant precedents for guidance.
  3. Further investigation shows that expert chess players store board positions in their memories as “chunks” consisting of around three mutually related pieces with pieces related by defensive dependency, attacking threats, proximity, colour or type. Since there is a logic to how pieces are grouped in memory chunks, grandmasters tend to need fewer chunks to remember a board position compared to lesser players.
  4. Simon estimated that a grandmaster needs at least 50,000 chunks of memory of patterns from actual games but probably many more and that this would require at least 10 years (or 10,000 hours) of constant practice.

The 10-year/10,000-hours rule to become an expert is now very widely known amongst coaches and indeed has been very influential in planning the long-term development of athletes. Much of the recent popularisation of the 10-year/10,000-hours rule is associated with Ericsson’s work on musical expertise. What is often forgotten is that Ericsson was originally inspired by Simon’s work in chess and indeed Ericsson went on to study under Simon. So our understanding of problem-solving in chess is already having an impact on player development in team sports albeit largely unacknowledged.

Chess grandmasters acquire their expertise in practical problem-solving by spending thousands of hours studying actual match positions and evaluating the moves made. Football and rugby coaches responsible for player development need to ask themselves if their coaching programmes are allocating enough time to developing game-intelligence in open play under match conditions. Not only do players need to analyse the videos of their own decision-making in games but they also need to build up their general knowledge of match positions and the decision-making of top players by continually studying match videos. And this analysis of decision-making should not be limited to the classroom. Improved decision-making should be a key learning outcome in all training sessions involving open play under match conditions.

Note

This post was originally written in June 2016 but never published. It may seem a little dated now but I think the essential insights remain valid. I am a qualified football coach (UEFA B License) and coached for several years from Under 5s through to college level before concentrating on providing data analysis to coaches. I have always considered my coaching experience to have been a key factor in developing effective analyst-coach relationships at the various teams with which I have worked.

Moneyball: Twenty Years On – Part Three

Executive Summary

  • Moneyball is principally a baseball story of using data analytics to support player recruitment
  • But the message is much more general on how to use data analytics as an evidence-based approach to managing sporting performance as part of a David strategy to compete effectively against teams with much greater economic power
  • The last twenty years have seen the generalisation of Moneyball both in its transferability to other team sports and its applicability beyond player recruitment to all other aspects of the coaching function particularly tactical analysis
  • There are two key requirements for the effective use of data analytics to manage sporting performance: (1) there must be buy-in to the usefulness of data analytics at all levels; and (2) the analyst must be able to understand the coaching problem from the perspective of the coaches, translate that into an analytical problem, and then translate the results of the data analysis into actionable insights for the coaches

Moneyball is principally a baseball story of using data analytics to support player recruitment. But the message is much more general on how to use data analytics as an evidence-based approach to managing sporting performance as part of a David strategy to compete effectively against teams with much greater economic power. My interest has been in generalising Moneyball both in its transferability to other team sports and its applicability beyond player recruitment to all other aspects of the coaching function particularly tactical analysis.

              The most obvious transferability of Moneyball is to other striking-and-fielding sports, particularly cricket. And indeed cricket is experiencing an analytics revolution akin to that in baseball stimulated in part by the explosive growth of the T20 format in the last 20 years especially the formation of the Indian Premier League (IPL). Intriguingly, Billy Beane himself is now involved with the Rajasthan Royals in the IPL. Cricket analytics is an area in which I am now taking an active interest and on which I intend to post regularly in the coming months after my visit to the Jio Institute in Mumbai.

              My primary interest in the transferability and applicability of Moneyball has been with what I call the “invasion-territorial” team sports that in one way or another seek to emulate the battlefield where the aim is to invade enemy territory to score by crossing a defended line or getting the ball into a defended net. The various codes of football – soccer, rugby, gridiron and Aussie Rules – as well as basketball and hockey are all invasion-territorial team sports. (Note: hereafter I will use “football” to refer to “soccer” and add the appropriate additional descriptor when discussing other codes of football.) Unlike the striking-and-fielding sports where the essence of the sport is the one-on-one contest between the batter and pitcher/bowler, the invasion-territorial team sports involve the tactical coordination of players undertaking a multitude of different skills. So whereas the initial sabermetric revolution at its core was the search for better batting and pitching metrics, in the invasion-territorial team sports the starting point is to develop an appropriate analytical model to capture the complex structure of the tactical contest involving multiple players and multiple skills. The focus is on multivariate player and team performance rating systems. And that requires detailed data on on-the-field performance in these sports that only became available from the late 1990s onwards.

              When I started to model the transfer values of football players in the mid-90s, the only generally available performance metrics were appearances, scoring and disciplinary records. These worked pretty well in capturing the performance drivers of player valuations and the statistical models achieved goodness of fit of around 80%. I was only able to start developing a player and team performance rating system for football in the early 2000s after Opta published yearbooks covering the English Premier League (EPL) with season totals for over 30 metrics for every player who had appeared in the EPL in the four seasons, 1998/99 – 2001/02. It was this work that I was presenting at the University of Michigan in September 2003 when I first read Moneyball.

              My player valuation work had got me into the boardrooms and I had used the same basic approach to develop a wage benchmarking system for the Scottish Premier League. But getting into the inner sanctum of the football operation in clubs proved much more difficult. My first success was to be invited to an away day for the coaching and support staff at Bolton Wanderers in October 2004 where I gave a presentation on the implications of Moneyball for football. Bolton under their head coach Sam Allardyce had developed their own David strategy – a holistic approach to player management based on extensive use of sport science. I proposed an e-screening system of players as a first stage of the scouting process to allow a more targeted approach to the allocation of Bolton’s scarce scouting resources. Pleasingly, Bolton’s Performance Director thought it was a great concept; disappointingly he wanted it to be done internally. It was a story repeated several times with both EPL teams and sport data providers – interest in the ideas but no real engagement. I was asked to provide tactical analysis for one club on the reasons behind the decline in their away performances but I wasn’t invited to present and participate in the discussion of my findings. I was emailed later that my report had generated a useful discussion but I needed more specific feedback to be able to develop the work. It was a similar story with another EPL club interested in developing their player rating system. Again the intermediaries presented my findings and the feedback was positive on the concept but then set out the limitations which I had listed in my report, all related to the need to use more detailed data than that with which I had been provided. Analytics can only be effective when there is meaningful engagement between the analyst and the decision-maker.

              The breakthrough in football came from a totally unexpected source – Billy Beane himself. Billy had developed a passion for football (soccer) and the Oakland A’s ownership group had acquired the Earthquakes franchise in Major League Soccer (MLS). Billy had found out about my work in football via an Australian professor at Stanford, George Foster, a passionate follower of sport particularly rugby league. Billy invited me to visit Oakland and we struck up a friendship that lasts to this day. As an owner of a MLS franchise, Oakland had access to performance data on every MLS game and, to cut a long story short, Billy wanted to see if the Moneyball concept could be transferred to football. Over the period 2007-10 I produced over 80 reports analysing player and team performance, investigating the critical success factors (CSFs) for football, and developing a Value-for-Money metric to identify undervalued players. We established proof of concept but at that point the MLS was too small financially to offer sufficient returns to sustain the investment needed to develop analytics in a team. I turned again to the EPL but with the same lack of interest as I had encountered earlier. The interest in my work now came from outside football entirely – rugby league and rugby union.

               The first coach to take my work seriously enough to actually engage with me directly was Brian Smith, an Australian rugby league coach. I spent the summer of 2005 in Sydney as a visiting academic at UTS. I ran a one-day workshop for head coaches and CEOs from a number of leading teams mainly in rugby league and Aussie Rules football. One of the topics covered was Moneyball. Brian Smith was head coach of Paramatta Eels and had developed his own system for tracking player performance. Not surprisingly, he was also a Moneyball fan. Brian gave me access to his data and we had a very full debrief on the results when Brian and his coaching staff visited Leeds later that year. It was again rugby league that showed real interest in my work after I finished my collaboration with Billy Beane. I met with Phil Clarke and his brother, Andrew, who ran a sport data management company, The Sports Office. Phil was a retired international rugby league player who had played most of his career with his hometown team, Wigan. As well as The Sports Office, Phil’s other major involvement was with Sky Sports as one of the main presenters of their rugby league coverage. I worked with Phil in analysing a dataset he had compiled on every try scored in Super League in the 2009 season and we presented these results to an industry audience. Subsequently, I worked with Phil in developing the statistical analysis to support the Sky Sports coverage of rugby league including an in-game performance gauge that included a traffic-lights system for three KPIs – metres gained, line breaks and tackle success – as well as predicting what the points margin should be based on the KPIs.

              But Phil’s most important contribution to my development of analytics with teams was the introduction in March 2010 to Brendan Venter at Saracens in rugby union. Brendan was a retired South African international who had appeared as a replacement in the famous Mandela World Cup Final in 1995. He had taken over as the Director of Rugby at Saracens at the start of the 2009/10 season and instituted a far-reaching cultural change at the club, central to which was a more holistic approach to player welfare and a thorough-going evidence-based approach to coaching. Each of the coaches had developed a systematic performance review process for their own areas of responsibility and the metrics generated had become a key component of the match review process with the players. My initial role was to develop the review process so that team and player performance could be benchmarked against previous performances. A full set of KPIs were identified with a traffic-lights system to indicate excellent, satisfactory and poor performance levels.  This augmented match review process was introduced at the start of the 2010/11 season and coincided with Saracens winning the league title for the first time in their history. The following season I was asked by the coaches to extend the analytics approach to opposition analysis, and the sophistication of the systems continued to evolve over the five seasons that I spent at Saracens.

              I finished at Saracens at the end of the 2014/15 season although I have continued to collaborate with Brendan Venter on various projects in rugby union over the years. But just as my time with Saracens was ending, a new opportunity opened up to move back to football, again courtesy of Billy Beane. Billy had been contacted by Robert Eenhoorn, a former MLB player from the Netherlands, who is now the CEO of AZ Alkmaar in the Dutch Eredivisie. Billy had become an advisor to AZ Alkmaar and had suggested to Robert to get me involved in the development of AZ’s use of data analytics. AZ Alkmaar are a relatively small-town team that seek to compete with the Big Three in Dutch football (Ajax Amsterdam, PSV Eindhoven and Feyenoord) in a sustainable, financially prudent way. Like Billy, Robert understands sport as a contest and sport as a business. AZ has a history of being innovative, particularly in youth development with a high proportion of their first-team squad coming from their academy. I developed similar systems as I had at Saracens to support the first team with performance reviews and opposition analysis. It was a very successful collaboration which ended in the summer of 2019 with data analytics well integrated into AZ’s way of doing things.

              Twenty years on, the impact of Moneyball has been truly revolutionary. Data analytics is now an accepted part of the coaching function in most elite team sports. But teams vary in the effectiveness with which they employ data analytics particularly in how well it is integrated into the scouting and coaching functions. There are still misperceptions about Moneyball especially in regard the extent to which data analytics is seen as a substitute for traditional scouting methods rather than being complementary. Ultimately an evidence-based approach is about using all available evidence effectively, not just quantitative data but also qualitative expert evaluations of coaches and scouts. Data analytics is a process of interrogating all of the data.

So what are the lessons from my own experience of the transferability and applicability of Moneyball? I think that there are two key lessons. First, it is crucial that there is buy-in to the usefulness of data analytics at all levels. It is not just leadership buy-in. Yes, the head coach and performance director must promote an evidence-based culture but the coaches must also buy-in to the analytics approach for any meaningful impact on the way things actually get done. And, of course, players must buy-in to the credibility of the analysis if it is to influence their behaviour. Second, the analyst must be able to understand the coaching problem from the perspective of the coaches, translate that into an analytical problem, and then translate the results of the data analysis into actionable insights for the coaches. There will be little buy-in from the coaches if the analyst does not speak their language and does not respect their expertise and experience.

Read Other Related Posts

Moneyball: Twenty Years On – Part Two

Executive Summary

  • Financial determinism in pro team sports is the basic proposition that the financial power to acquire top playing talent determines sporting performance (sport’s “ law of gravity”)
  • The Oakland A’s under Billy Beane have consistently defied the law of gravity for over a quarter of a century by using a “David strategy” of continuous innovation based on data analytics and creativity

Financial determinism in pro team sports is the basic proposition that sporting performance is largely determined by the financial power of a team to acquire top playing talent. This gives rise to sport’s equivalent of the law of gravity – teams will tend to perform on the field in line with their expenditure on playing talent relative to other teams in the league. The biggest spenders will tend to finish towards the top of the league; the lowest spenders will tend to finish towards the bottom of the league. A team may occasionally defy the law of gravity – Leicester City winning the English Premier League in 2016 is the most famous recent example – but such extreme cases of beating the odds are rare.

Governing bodies tend to be very concerned about financial determinism since it can undermine the uncertainty of outcome – sport, after all, is unscripted drama where no one knows the outcome in advance. It is a fundamental tenet of sports economics that uncertainty of outcome is a necessary requirement for spectator interest and the financial stability of pro sports leagues. Hence why governing bodies have actively intervened over the years to try to maintain competitive balance with revenue-sharing arrangements (e.g. shared gate receipts and collective selling of media rights) and player labour market regulations (e.g. salary caps and player drafts). And financial determinism creates the danger that teams without rich owners will incur unsustainable levels of debt in pursuit of the dream of sporting success and eventually collapse into bankruptcy (as Leeds United fans know only too well given their experience in the early 2000s).

Major League Baseball (MLB), like the other North American Major Leagues, have actively intervened in the player labour market via salary caps, luxury taxes on excessive spending and a player draft system to try to reduce the disparity between teams in the distribution of playing talent. But financial determinism is still strong in the MLB as can be seen in Figure 1 which shows the average win rank and average wage rank of the 30 MLB team over the 26-year period, 1998 – 2023 (1998 was Billy Beane’s first season as GM at the Oakland A’s). There is a very strong correlation between player wage expenditure and regular-season win percentage (r = 0.691). The three biggest spenders – New York Yankees, Boston Red Sox and LA Dodgers – have been amongst the five most successful teams over the period with the New York Yankees topping both charts (with an average win rank of 5.8 and an average wage rank 1.8).

Figure 1: Financial Determinism in the MLB, 1998 – 2023    

The standout team in defying the law of gravity are Oakland A’s. Over a 26-year period, their average wage rank has been 25.5 but their average win rank has been 13.0 which gives a rank gap of 12.5. Put another way, the A’s have had the 3rd lowest average wage rank over the last 26 years but are in the top ten in terms of their average win rank. Looking at Figure 1, the obvious benchmarks for the A’s in spending terms are Tampa Bay Rays, Miami Marlins and Pittsburgh Pirates but all of these teams have had much poorer sporting performance than the A’s. Indeed in terms of sporting performance as measured by average win rank, the A’s peers are LA Angels, their Bay Area rivals, San Francisco Giants, Houston Astros and Cleveland Guardians (formerly Cleveland Indians) but all of these teams have had much higher levels of expenditure on player salaries.

Figure 2 details the year-to-year record of the A’s over the whole period of Billy Bean’s tenure as GM then Executive Vice President for Baseball Operations. As can be seen, the A’s have consistently been amongst the lowest spenders in the MLB and, indeed, there are only two years (2004 and 2007) when they were not in the bottom third. The regular-season win percentage has been rather cyclical with peaks in 2001/2002, 2006, 2012/2013 and 2018/2019. The 2001 and 2002 seasons are the “Moneyball Years” covered by Michel Lewis in the book when the A’s had the 2nd best win percentage in both seasons. As discussed in Part One of this post, the efficient market hypothesis (EMH) in economics suggests that any competitive advantage based on inefficient use of information by other traders will quickly evaporate when the informational inefficiencies become widely recognised. Hence, the EMH implies that the A’s initial success would be short-lived and other teams would soon “catch up” and start to use similar player metrics as the A’s. Which is exactly what happened. In fact, Moneyball led all other MLB teams to start using data analytics more extensively, some more than others. This is what makes the A’s experience so unique – other teams imitated the A’s in their use of data analytics and developed their own specific data-based strategies but still the A’s kept punching well above their financial weight and making it to the post-season playoffs on several occasions. This suggests that the A’s have been highly innovative in developing analytics-based David strategies which have informed both their international recruitment and player development in their farm system. Just as in the Land of the Red Queen in Alice in Wonderland, so too in elite sport when competing with analytics, you’ve got to keep running to stay still.

Success = Analytics + Creativity.

Figure 2: Oakland A’s Under Billy Beane, 1998 – 2023

Read Other Related Posts

Moneyball: Twenty Years On – Part One

Executive Summary

  • The lasting legacy of Moneyball is as an exemplar of the possibilities of competitive advantage to be gained from the smarter use of data analytics as part of an evidence-based approach to decision-making
  • The technical essence of Moneyball is using on-base percentage (OBP) as the primary hitter metric in baseball for player recruitment
  • Moneyball shows how Billy Beane and the Oakland A’s developed a David strategy to take advantage of the inefficiency of other MLB teams in valuing the win contributions of players.

Unbelievably it is twenty years ago this month since Michael Lewis’s book, Moneyball: The Art of Winning an Unfair Game, was published. (The subtitle is really important as I’ll discuss later.) It is a book, along with the spin-off Hollywood movie starring Brad Pitt, that has had a massive impact on elite team sports around the world and fundamentally changed the way that teams do things. And it has been hugely significant to me, personally. Moneyball quite simply changed my professional life.

              I’ve told the story so many times of how I came to read Moneyball for the first time. I was visiting the University of Michigan at the end of September 2003 to talk about the work I was doing in professional team sport both academically and as a practitioner. I had developed a player valuation system to estimate transfer values of football players. I was being driven to Detroit airport on the Friday afternoon at the end of my visit when the prof who had invited me said “You must read this new book, Moneyball. It’s you but baseball.” I purchased it in the airport at 6pm that evening and, partly due to a delay in my flight to Edmonton to visit a dear friend and fellow academic, the late Dr Trevor Slack, I completed my first read by 6am Saturday morning. I was blown away. I had been advocating a more data-based approach to player valuation and here was someone, Billy Beane, actually doing it at the elite level and creating a winning team on a very limited budget. A real-life case study of what I came to call a “David strategy” – a smart and financially sustainable way of competing against financial giants. Remember those were the days where my local club, Leeds United, were on the brink of bankruptcy thanks to a financial strategy based more on a roll of the dice than rational calculation. Smart thinking wasn’t much in evidence in that particular boardroom.

              It’s no surprise really that Moneyball is a baseball story in the sense that the first analytics-based approach in a team sport was always most likely to occur in a striking-and-fielding sport such as baseball or cricket for one very simple reason – the ease of data collection. At the core of a striking-and-fielding sports is the one-on-one contest between pitcher/bowler and batter, easily recorded by paper-and-pencil methods. Hence, the essential performance data for baseball and cricket have been widely available from the earliest days. As a consequence, you do not need to be an “insider” working at the elite level of these sports to be able to analyse the data.  Any fan with an interest in analysing baseball and cricket data has been able to do so. For example, Stephen Jay Gould, the evolutionary biologist who developed the theory of punctuated equilibrium (and, incidentally, was a visiting undergraduate student at the University of Leeds), devoted a whole section of his book Life’s Grandeur: The Spread of Excellence from Plato to Darwin (Jonathan Cape, London, 1996) to the evolution of performance in baseball, particularly focusing on why no one has posted a batting average over 0.400 in the MLB since Ted Williams in 1941. Of course, the baseball fan par excellence with an interest in analysing the data is Bill James and it was his analysis more than anything that inspired Billy Beane and the Oakland A’s.

              The technical essence of Moneyball is the use of on-base percentage (OBP) as the primary hitter metric for player recruitment. James had shown that OBP is a much better predictor of game outcomes than the two traditional hitting metrics – the batting average and the slugging average – which both only allow for the batter’s ability to hit their way to base and take no account of their propensity to be walked to base. James actually proposed combining OBP and the slugging average i.e. On-base Plus Slugging (OPS) as the preferred hitting metric. Effectively, conventional baseball wisdom treated walks more as a pitcher error or a pitcher risk-averse tactic rather than allowing for the hitter skill of selecting which pitch to swing at and which to leave. It was this perception of walks that opened up the possibility of a “free lunch”. In economic terms, by using hitting average and slugging average to value hitters and ignoring OBP, the baseball players’ labour market was being inefficient. It would be possible to buy runs more cheaply by targeting hitters that had good hitting/slugging averages but with a high propensity to be walked to base. If this latter skill was not valued by the market, it could be bought for free.

              Moneyball soon found its way onto many business school reading lists as a real-world example of the efficient market hypothesis (EMH) which proposed that there is an inherent tendency for markets to eliminate informational inefficiencies where available information is being used incorrectly. As soon as one trader recognises the inefficiency, they will exploit it by buying under-priced assets and making a profit. In the case of Billy Beane, he acquired under-valued hitters that meant that Oakland could punch way above their financial weight, buying more runs from their limited budget by being smarter than other teams in valuing the win contributions of players. And, in retrospect, it is no surprise that it was Michael Lewis who wrote Moneyball since he started his professional life as a financial trader, well aware of how to use information to profit in markets. No wonder the story of Billy Beane and the Oakland A’s appealed to him. It is a story of enduring appeal not only for baseball but all team sports and, indeed, for any organisation trying to find a David strategy to gain a competitive advantage by being smarter in their use of data. I will discuss this enduring appeal further in Part 2 next week.

Read Other Related Posts