Pythagorean Expected Wins Revisited

Executive Summary
-The Pythagorean expected-wins model provides a very simple predictor of team win% in baseball using squared scores
-Predictive accuracy can be improved by slightly adjusting the model
-The Pythagorean expected-wins model should be a key component in the strategic planning of the team roster and evaluating the sporting “bottom line” of any player trade-
-The Pythagorean expected-wins model can be applied to other team sports but typically requires using different powers depending on the average number of scores and the average winning margin
-The Pythagorean expected-wins model can be applied in soccer with appropriate adjustments to allow for tied games but goal difference remains the best simple score-based predictor of league performance
-Ultimately what is transferable from baseball across all pro team sports is not the specifics of the Pythagorean expected-wins model but rather the discipline of projecting the expected performance gains from any significant player recruitment decision

Pythagorean Expected Wins
Of all of Bill James’s many contributions to sabermetrics, probably the best known particularly outside baseball is his notion of Pythagorean win expectation. It is a very simple idea – league performance over a season will be closely associated with total scores made and total scores conceded. James’s innovative insight from his extensive study of baseball data was that the relationship between team win%, runs scored (RS) and runs allowed (RA) followed a power relationship:
Win%=〖RS〗^2/(〖RS〗^2+〖RA〗^2 )

As an example, Pythagorean win expectation applied to the 2014 MLB regular season yields the following results:

Table 1: MLB Regular Season 2014

Blog 18.06 Graphic (Table 1)

Pythagorean win% works pretty well in predicting the actual win% as I have tried to highlight using conditional formatting – the colour coding matches up pretty closely but with some important exceptions discussed below.

Using Expected Wins as a Strategic Tool
There are a couple of practical uses of Pythagorean expected wins. The first and by far the most important is that it provides a key relationship to be considered when planning changes to a team roster. Pythagorean expected wins can be used to project the likely impact on the team win% of a series of player trades. It is a great discipline for GMs and Personnel Directors to formalise exactly what they expect a new recruit to bring to the team. What is the sporting “bottom line” of any trade?
Suppose you were acting as an advisor to the Milwaukee Brewers at the end of the 2014 season. The 2014 Brewers were the epitome of an average team certainly statistically. The MLB team average for total runs scored and allowed that season was 659; the Brewers scored 650 runs and allowed 657 runs, finishing with 82 wins and 80 losses to yield an actual win% of 0.506. The Pythagorean formula would have predicted 80 wins (i.e. Pythagorean win% = 0.495). If you had wanted to transform the 2014 Brewers into a team capable of competing for the World Series, you would have needed to target a regular-season win% of around 0.600 which represents 97 wins. To achieve this level of performance would require an improvement in hitting and pitching of approximately 10% to 715 runs scored and 591 runs allowed.
Win%=〖715〗^2/(〖715〗^2+〖591〗^2 )=0.594=>96 wins (= .594 x 162 games)
Of course an equal 10% improvement in hitting and pitching is just one scenario. If you look at the LA Angels with the best win% in the 2014 regular season at 0.605, this was achieved principally by their hitting strength with 773 runs scored (17.3% better than the MLB average). The Angels allowed 630 runs which was only 4.4% better than the MLB average.
Achieving that magnitude of performance improvement is a tall order for any organisation and would require a strategic plan over a number of seasons involving player trading, draft picks, player development and financial planning. But the key point is that James’s formula helps formalise the task more precisely and provides a means of evaluating how alternative courses of action could contribute to the strategic goal of the organisation.

Benchmarking with Expected Wins: Altitude Effects in Denver, Loss of Form in Oakland
Another application of Pythagorean expected wins is as a useful benchmarking device to identify large anomalies between actual win% and predicted win%. Two such anomalies stand out in the 2014 data. The Colorado Rockies had an actual win% of only 0.407, the 2nd lowest, but their Pythagorean win% was significantly higher at 0.460. This deviation is largely due to the impact of the Rockies playing their home games at altitude in Denver. Games involving the Rockies that season averaged 19.1% more runs than the MLB average.
The other big anomaly that season were the Oakland A’s and shows the effect of an extremely inconsistent season. The A’s had a league-best 59 wins and 36 losses at the All-Stars break (win% = 0.621) and still led the Angels in the AL West in early August (9th Aug: 72 wins, 44 losses, win% = 0.621) but thereafter their season collapsed with the A’s losing 30 of their last 46 games and only scraping into the post-season Wild Card game by winning their final regular-season game. But the A’s had been so good in the first two-thirds of the season that their season totals of runs scored and runs allowed still predicted that they should have had the best win% rather than finishing 10 games behind the Angels, their divisional rivals.

Do Squares Yield The Best Predictor?
James’s Pythagorean expected-win model has stood the test of time as a very useful and accurate predictor but its accuracy can be improved upon by adjusting the Pythagorean parameter (i.e. power used). In the case of the 2014 regular season, predictive accuracy can be improved by using 1.810 (based on minimising total squared deviations). This is in line with various other studies such as Kaplan and Rich in 2017* who found that the best fit in individual seasons 2007 – 2016 varied from a minimum of 1.63 to a maximum of 1.96. (Kaplan and Rich’s method gives a Pythagorean parameter of 1.79 for the 2014 season which is very close to my own findings.) All of which goes to confirm that James’s original insight back in the 1970s still remains a very good approximation to MLB reality 40 years later.

Does the Pythagorean Expected-Wins Model Apply to Other Team Sports?
There have been attempts to apply James’s model to other team sports. Kaplan and Rich in their study report results for 2007 – 2016 for the other Major Leagues. They found that the NFL (American Football) and NHL (ice hockey) also work well using squared scores although the expected-wins model works better in the NFL using a Pythagorean parameter around 2.8. Predictive accuracy in the NBA (basketball) requires a parameter in the range 12 – 14. Kaplan and Rich show that the differences in the Pythagorean parameter across sports depend on the average score per game and the average winning margin.
And what about (association) football? The first complication is that football allows for tied games. It is not unusual for 20% – 25% of games to finish tied. This is further complicated by football’s 3-1-0 points system. I have found that the most useful way to apply the Pythagorean expected-wins model in European football is to treat tied games as “half-wins” for the purpose of calculating team win%. The alternative is to use the points percentage (i.e. total league points as a proportion of the maximum attainable). However typically you find that even with the adjustment for tied games, squared scores do not predict very well at the extremes. Table 2 provides a good example of the problem using the FA Premier League for 2013/14. Using squared goals massively over-predicts the win% of the top three teams and under-predicts the win% of the bottom three.

Table 2: FA Premier League, England, 2013/14

Blog 18.06 Graphic (Table 2)

Predictive accuracy in European football is much improved by using a Pythagorean parameter much closer to unity. Using the method of minimising total squared deviations I find that 1.232 works best for the FA Premier League that season. A Pythagorean parameter close to unity in European football fits with the common finding that goal difference is the best simple score-based predictor of league performance. So there really is no need to complicate things; in football just use goal difference to predict league performance.

The Bottom Line
Ultimately what is transferable from MLB across all team sports is not the specifics of the Pythagorean expected-wins model per se. Rather it is the discipline of projecting the expected performance gains from any significant player recruitment decision. Given the size of the financial commitments involved in the salary, acquisition and development costs of elite players, it is only rational to try to project the expected benefits. Some will argue that sport is different and that the expected impact of a new player cannot be quantified. But as soon as you have signed the contract, you have quantified the value of that player financially irrespective of whether or not you believe sport is different. Costs are costs in sport as in business. Where sport differs is in the resistance of some to subjecting their expectations of performance gains to due diligence. You would expect the financial director to subject other major investment by your organisation to proper due diligence using project appraisal techniques. So why not adopt the same logic to sporting investments? That ultimately was the whole point of the Moneyball story which popularised the strategic possibilities for the Pythagorean expected-wins model.

*Edward H. Kaplan and Candler Rich, ‘Decomposing Pythagoras’, Journal of Quantitative Analysis of Sports, vol. 13, no. 4 (2017), pp. 141- 149.

Excelling in Analytics

Executive Summary
• Effective data analysts need to have great Excel skills; expect 80% of your analytics to be conducted in Excel
• Too often data analysts do not receive enough training on Excel especially at universities, partly because of “software snobbery” and partly because of a lack of appreciation of Excel’s extensive analytical functionality
• The six specific functions in Excel with which every analyst should be very familiar are:
o Pivot Tables
o Custom Sort and Filter
o Conditional Formatting
o Graphics
o Formulas Menu
o Add-Ins

I’ve been asked many times over the years for advice on how to be an effective data analyst. I always stress the importance of having really good relationships with the decision-makers for whom you are working. Always ensure before you start any data analysis that you are clear on how your analysis is going to be used – which decisions is it going to feed into? Analytics after all is analysis with real-world purpose.
The other practical advice I regularly give data analysts is to ensure that you have got great Excel skills. I usually support this with a couple of anecdotes. One is that of an economics graduate with an excellent degree from a well-respected business school who was turned down for a post with a central bank because of a lack of Excel skills. The other is about a meeting I attended that was hosted by the local branch of the OR Society a couple of years ago. Four data analysts were invited to talk about their work experiences. During the Q&A they were asked what they wished that they had done more of at university and all four agreed that they wished they had been taught Excel. They estimated that 80% of their work was done in Excel and only 20% in more specialised software yet all had only been taught how to use specialised statistical software at university. These analysts had gone to different universities and studied different courses but their experiences were common to most data analysts. In part the lack of Excel training at universities is “software snobbery” – it’s sexier to teach more advanced software such as R, SPSS and SAS. But I also think that it is because of a misperception that Excel has limited analytical functionality. Nothing could be wider of the mark. In my own case I use Microsoft Access to store and combine datasets; I use Excel to do the bulk of my data manipulation and analysis; for multiple regression and other advanced statistical techniques I mostly use PC-Give (an econometrics package that I have used for years and which is particularly useful for investigating structural breaks in relationships).
There are many reasons why Excel is so widely used by data analysts. Microsoft Excel and Microsoft Office are so widely available that there are very few organisations that do not operate in a Microsoft environment. Excel is very convenient for basic data preparation. It is easy to import data into Excel, clean it up, transform it as required, and export it to other packages. But Excel has extensive analytical functionality as I will discuss further and can be supplemented by a wide range of add-ins provided by Microsoft and other sources. Excel is particularly suited to analysts who prefer the transparency and flexibility of programming within a spreadsheet environment to meet their specific needs as they arise. If, like me, you prefer to check the output of your programming instantly with a real example then Excel is for you. And, crucially, given that most of your colleagues will have Excel on their PCs, laptops, tablets and mobile devices, it becomes a convenient communication tool. You might find it desirable at times to send your spreadsheets to colleagues. It is quite common for organisations to track their KPIs graphically in dashboards constructed in Excel.
There are six specific functions in Excel with which I think every data analyst should be very familiar and which I use extensively in my own work in sports analytics. The list is by no means comprehensive and I am sure others would consider other Excel functions to be just as important if not more so but these are six Excel functions without which my data analysis for teams would be much more time-consuming and probably much less effective.

1. Pivot Tables
Pivot tables are a very efficient means of combining rows of data and calculating summary statistics. They are particularly useful when you are dealing with large datasets when it could take considerable time to sort the data into sub-groups and then calculate summary statistics for each sub-group. I hate to think of the amount of time I wasted in my early days doing just that in ignorance that the Pivot Table function would do it in seconds. The example below is taken from my season review of the 2014/15 Aviva Premiership. I had a summary row of KPIs for every team in every regular season game. Only 264 rows of data (12 teams x 22 games) but it would have taken several minutes to sort the data for each team and calculate season averages for each team. The Pivot Table below did it all in seconds.

Figure 1: An Example of a Pivot Table

Blog 18.05 Graphic (Pivot Table)

2. Custom Sort and Filter
The Custom Sort and Filter icon in the Home menu is definitely one of my most clicked icons in Excel. The Custom Sort function allows you to order your rows by multiple criteria. I am forever using it to sort team and player performance data which I load as it becomes available but will often want it grouped by team/player and ordered by date. The Filter tool is also very useful. It allows you to select sub-samples of relevant data, again using multiple criteria. Only yesterday I used Filter to extract data on every possession in the Aviva Premiership this season. My full dataset for this season has around 140k rows of data from which I extracted data on 6,550 possessions. It took two clicks to get the possession data using Filter. I copied and pasted it into another spreadsheet and was ready to roll on analysing these possessions. It is also worth remembering that there is a Filter option within Pivot Tables which is particularly useful if you want to analyse different types of sub-samples. Instead of filtering the data to create spreadsheets for each sub-samples and then producing pivot tables, you can work with the initial spreadsheet of data throughout and just filter when you are constructing your pivot tables.

3. Conditional Formatting
Conditional formatting is a very useful facility available in the Home menu to visualise differences in your data using colours, data bars or a variety of icons include traffic lights and flags. I make very extensive use of colour-coding of KPIs using Conditional Formatting. I have included below an extract from a recent analysis of the EFL Championship. It immediately highlights the tactical differences between teams as regards whether or not they play a possession-based style (e.g. Brentford compared to Burton Albion and Cardiff City), as well as highlighting that Birmingham City and Burton Albion rank poorly in some of the critical shooting KPIs.

Figure 2: Colour-Coding KPIs using Conditional Formatting

Blog 18.05 Graphic (Cond Format)

4. Graphics
One of the main complaints that I and other users had with the early versions of Excel was that the graphics facility was limited in its options and not user-friendly. I found that once I had created a graphic that did the job, I would go back and just copy and edit it when I needed that type of graphic again. But as data visualisation has become more and more important Microsoft have massively improved the graphics facility in Excel to generate a wide variety of graphics that are easily edited with pull-down menus. On my last count I found 53 basic graphical templates available in Excel. Excel’s graphic facility now has smart functionality. If you highlight the data to be graphed and then click on the Recommended Chart icon in the Insert menu, Excel will identify the most useful graphic templates for your specific data.

5. Formula Menu
The functionality in the Excel Formulas menu is often not fully appreciated. There is so much more available in Excel’s Function Library beyond the extensive array of data analysis functionality in Math & Trig, Statistical (found in More Functions) and Financial. The functionality in Logical and Text can be very powerful for editing textual data.

Figure 3: Excel’s Function Library

Blog 18.05 Graphic (Formulas)

6. Add-Ins
Excel comes with some optional functionality in the form of add-ins that need to be loaded via the File menu (select Options, then Add-Ins, then Manage Excel Add-Ins, then tick the boxes for the add-ins you want to load) when first used. There are two add-ins that I use frequently – Data Analysis and Solver. The Data Analysis add-in provides a set of statistical macros that provide an efficient one-stop facility to undertake a number of related statistical functions. Options in Data Analysis include ANOVA (both single factor and two factor versions), Correlation, Regression and t Tests (both for paired and independent samples) amongst many others. The Solver add-in is an optimisation facility which is very useful not only for undertaking standard OR problems but also allows you to construct a variety of more advanced statistical methods such as those using Maximum Likelihood e.g. logistic regression. There are also a variety of “unofficial” add-ins available for Excel. I use several add-ins provided by Conrad Carlberg to accompany his brilliant guides to doing analytics in Excel (see below).

So my basic message to data analysts, both those just starting as well as old-hands, is to regularly explore the functionality of Excel. You will often be surprised as to just how much Excel can do for you. And to educators a plea that you ensure that your students get a good introduction to Excel – it is a basic life skill if you are going to work anywhere that uses data especially if you are going to have any responsibility for managing performance.

 

Some Further Reading on Excel for Analytics

There are a series of books by Conrad Carlberg providing excellent introductions to analytics using Excel:

Conrad Carlberg, Statistical Analysis: Microsoft Excel 2013, Que, Indianapolis, IN, 2014.

Conrad Carlberg, Decision Analytics: Microsoft Excel, Que, Indianapolis, IN, 2014.

Conrad Carlberg, Predictive Analytics: Microsoft Excel, Que, Indianapolis, IN, 2013.

For a comprehensive coverage of analytical methods using Excel (with excellent appendices on the basics of Excel and Access), try:

J. D. Camm, J. J. Cochran, M. J. Fry, J. W. Ohlmann, D. R. Anderson, D. J. Sweeney and T. A. Williams, Essentials of Business Analytics (2nd edn), Cengage Learning, Boston, MA, 2017.
Extensive but expensive!

Improving Performance Ratio Analysis Part 2: The Structured Hierarchy Approach

 

Executive Summary
-Practitioners are often criticised for using performance ratios in a very piecemeal and fragmented fashion
-The problem is often compounded by more holistic approaches such as the balanced scorecard which encourages reporting of a very diversified set of performance metrics often with little understanding of their interdependencies and links to strategic goals
-The structured hierarchy approach provides a systematic framework for the forensic investigation of performance trends and benchmark comparisons
-The structured hierarchy approach can use a formalised mathematical structure but this is not necessary nor may it be appropriate in some contexts
-The structured hierarchy approach requires that the analyst has a clear understanding of the overall structure of the process being analysed and of how the performance ratios are related to the various components of the process
-Statistically significant differences between ratios at one level are not necessarily evident at other levels

As well as ignoring the various statistical and other methodological issues with performance ratio analysis, another strand of criticism directed at practitioners has been the tendency to use performance ratios in a very piecemeal and fragmented fashion. Again this has been a very common criticism of financial ratio analysis. And in some ways the problem was made worse by another criticism that there is too much emphasis on financial performance in assessing business performance. This led to Kaplan and Norton proposing the balanced scorecard approach in which four dimensions of business performance are identified – financial, customer, business processes, and learning and growth – with businesses encouraged to monitor a set of KPIs for each dimension. Although the need for a more holistic approach to performance is well taken, in practice the balanced scorecard approach has just compounded the problem by leading to a greater range of performance metrics being reported but still used in a very piecemeal and fragmented fashion. Ittner and Larckner, in particular, have been very critical of the balanced scorecard approach, arguing that the performance metrics are seldom linked to the strategic goals of a business, the supposed links between the metrics and overall performance are not validated and tend to be more articles of faith than evidence-based, and, as a consequence, the balanced scorecard does not lead to the right performance targets being set.
But again the problem of a fragmented approach to performance ratio analysis has been recognised in finance and a more structured approach has been adopted by some and often referred to as the Du Pont system recognising the chemical conglomerate that first popularised the approach. Others have called it the pyramid-of-ratios approach. The basic idea is to take an overall performance ratio and then decompose it into constituent ratios. For example, the return on assets (ROA) is calculated as the ratio of profits to assets. ROA can be decomposed into two constituent ratios – asset turnover (= sales/assets) and profit margin (= profit/sales). These two ratios capture the two fundamentals of any business – the ability to “sweat the assets” to generate sales (as measured by asset turnover) and the ability to extract profit from sales (measured by the profit margin). So if you want to understand changes in a company’s ROA over time or you want to explain the difference in ROA between companies, you can use this structured approach to determine whether the changes/differences in ROA are due mainly to changes/differences in asset turnover which reflects external market conditions, or changes/differences in the profit margin which reflects internal production conditions. The simple ROA pyramid is summarised in Figure 1. It can be extended in both directions, upwards by relating ROA to other rates of return, and downwards by further decomposing asset turnover and profit margin.
Figure 1: The ROA Pyramid

zzzz

The Du Pont/pyramid-of-ratios approach is an example of what I call the structured hierarchy approach and provides a systematic framework for the forensic investigation of performance trends and benchmark comparisons. In particular the hierarchical structure facilitates a more efficient analysis of performance by first identifying which aspects of performance primarily account for the differences/changes in performance overall and then tunnelling down into those specific aspects of performance in more detail.
Although the structured hierarchy approach as applied in financial performance analysis often uses a multiplicative decomposition in which performance ratios are decomposed into a sequence (or “chain”) of ratios, the product of which equals the higher-level ratio, there is no need to impose such a formalised mathematical structure. You don’t need to adopt a “one-size-fits-all” approach to creating a structured hierarchy. Multiplicative decomposition is particularly useful when dealing with processes that can be broken down into a sequence of sub-processes in which the output of one sub-process provides the input for the next sub-process in the sequence. In some cases it might be more useful to apply a linear decomposition in which a ratio is broken down into the sum of a set of constituent ratios. Linear decomposition is useful when a higher-level performance ratio depends of two or more activities that are separable and relatively independent of each other. But in many cases the structured hierarchy approach is best seen as a much more informal structure without any specific mathematical structure imposed on the relationships between performance ratios. The key point is that the structured hierarchy approach requires a clear understanding of the overall structure of the process being analysed and of how the performance ratios are related to the various components of the process.

zzz

Most of my work is in the invasion-territorial team sports mainly (association) football and rugby union. When putting together a system of KPIs to track performance, I always adopt a structured hierarchy approach. The approach is quite generic across both sports as I have summarised in Figure 2. The win percentage depends on the score difference between scores made and scores conceded (a linear decomposition). Typically these performance metrics are reported as game averages to facilitate comparisons between teams and over time. Scores made represents attacking effectiveness and naturally leads you to tunnel down into the different aspects of attacking play. In football I tend to separate attacking play into three dimensions – passing, other attacking play (e.g. crosses and dribbles), and shooting. When it comes to scores conceded I tend to separate this into exit play and defence. Exit play is a familiar term in rugby union but little used in football. Working across these sports I am particularly interested in their tactical commonalities especially the territorial dimension. I plan to post in more detail on this in the near future. But suffice to say at the moment that my experience working in rugby union particularly with Brendan Venter has made me even more acutely aware of the importance of play in possession deep in your own half. Lose possession there and you are going to cause yourself trouble. It’s what I call a SIW (self-inflicted wound). The same tactical considerations are equally applicable in football and underpin the use of a deep pressing game to maximise the number of times opponents can be pressurised into losing possession deep in their own half. A pressing game is all about reducing the effectiveness of opposition exit play. As I said I will pursue this line of thinking in more detail in a subsequent post.
Figure 2: A Generic Structured Hierarchy Approach for Invasion-Territorial Sports

zz

One final point to bear in mind when working with performance ratios as a structured hierarchy. Statistically significant differences between ratios at one level are not necessarily evident at other levels. Again this is a problem that has bedevilled research in financial performance analysis. For example, research on the impact of location on business performance usually found significant differences in profitability between urban and rural locations but the urban-rural differences were often no longer statistically significant when profitability was decomposed. But just because there are statistical differences in ratios at one level in a structured hierarchy does not in any way imply that these statistical differences should be observed at other levels. One sporting example of this is the score difference. By definition if you analyse differences between winning and losing performances, the score difference will always be statistically significant – positive when a team wins, negative when a team loses. However when you break this down for individual teams it does not always follow that there are statistically significant differences in scores made and scores conceded. For some teams winning and losing is much more about the variation in their attacking effectiveness than the effectiveness of their exit play or defence. So in a win-loss analysis these teams will tend to have statistically significant differences in scores made but not in scores conceded. It can go the other way for teams where defensive effectiveness is the crucial performance differentiator.

Improving Performance Ratio Analysis Part 1: Some Lessons from Finance

Executive Summary
• Performance ratios are widely used because they are easy to interpret and enhance comparability by controlling for scale effects on performance
• But performance ratios are susceptible to a number of potential problems that can seriously undermine their usefulness and even lead to misleading recommendations on how to improve performance
• The problems with performance ratios are well known in finance but are largely ignored by practitioners
• Crucially performance ratio analysis assumes that scale effects are linear
• Before using performance ratios, analysts should explore the shape of the relationship between performance and scale, and check for linearity
• If the performance relationship is non-linear, group performances by scale and use appropriate scale-specific benchmarks for each group
• Remember effective performance ratio analysis is always trying to compare like with like

It is very common for KPIs to be formulated as performance ratios. The reason for this is very simple. Ratios can enhance comparability when there are significant scale effects. For example, it tells us very little if we compare the total activity levels of two players with very different amounts of game time. We would naturally expect that players with more game time will tend to do more. In this situation it makes more sense to control for game time and compare instead activity levels per minute played.

Blog 18.03 Graphic (Box)

As well as controlling for scale effects on performance levels, ratios can also control for size effects on the degree of dispersion which can create problems for more sophisticated statistical modelling such as regression (the so-called heteroscedasticity problem).
However, despite the very widespread use of performance ratios, there are a number of potential problems with using ratios, some of which can seriously affect the validity of any conclusions drawn about performance and even lead to misleading recommendations on interventions to improve performance. The problems with performance ratios are well known in finance where financial ratio analysis is the standard method for analysing the financial performance of businesses. Hence I believe that there are lessons to be learnt from financial ratio analysis that can be applied to improve the use of performance ratios in sport.
One of the key messages in the debates on the use of performance ratios in finance is the importance of recognising that ratio analysis implies strict proportionality. What this means is best explained diagrammatically. Suppose that we want to compare two performances, A and B, where B is a performance associated with a larger scale. Suppose also that we know the expected (or benchmark) relationship between scale and outcome and that both of the observed performances lie on the benchmark relationship. In this case performance ratio analysis would be useful only if the outcome-to-scale ratio is equal for A and B. Graphically the outcome-to-scale ratio represents the slope of the line from the origin to the performance point. It follows that A and B can only have the same performance ratio if they both lie on the same line from the origin. This is strict proportionality and is shown in Figure 1(a). Comparing performance ratios against a single benchmark ratio value presupposes that the scale-outcome relationship is linear with a zero intercept. If either of these assumptions does not hold, then it is no longer valid to draw conclusions about performance by comparing performance ratios. This is a really important point but ignored by the vast majority of users of performance ratios.
The problems of non-zero intercepts and non-linear relationships are illustrated in Figures 1(b) and 1(c). In both cases A and B are on the benchmark relationship but their performance ratios (represented by the slopes of the blue lines) differ. In these cases performances ratios become much more difficult to interpret. It is no longer necessarily the case that differences between performance ratios can be interpreted as deviations from the benchmark, implying better/worse performance after controlling for scale effects. Effectively the problem is that the scale effects have not been fully controlled so that differences in performance ratios are still partly reflecting scale effects on performance.

Blog 18.03 Graphic (Fig 1)
So what is to be done? It becomes even more important to undertake exploratory data analysis to understand the shape of the relationship between performance and the relevant scale measure. At the very least you should always plot a scatter graph of performance against scale. If it looks as if there a non-zero intercept (i.e. there is a non-scale-related component in performance), then re-calculate the performance ratio using the deviation of performance from the non-zero intercept. If the performance relationship looks to be non-linear, then categorise your performances into different scale classes and use a range of values for the benchmark ratio appropriate for different scales. For example, in association football, the number of passes is often used as a scale measure for performance ratios. But we would expect very different ratio values for teams playing a possession-based, tiki-taka passing style compared to teams adopting a more direct style. Unless the underlying benchmark relationships exhibit strict proportionality, different benchmarks should be used to evaluate the performances of possession-based teams and direct-play teams. Always try to compare like with like.
There are two other statistical problems with ratio analysis that should also be noted. First, if the same scale measure is used in several performance ratios, this can influence the degree of association between the ratios. It is called the spurious correlation problem and was first identified in the late 19th Century in studies of evolutionary biology. Using common denominators in ratios can create the appearance of a much stronger relationship between different aspects of performance than there actually exists. In some circumstances common denominators can obscure the degree of relationship between different aspects of performance. Another statistical problem with ratio analysis is that ratios can exaggerate the degree of variation as the denominator gets close to zero and the ratio becomes very large. It is crucial to be aware of these outliers since they can have undue influence on the results of any statistical analysis of the performance ratios.
Some researchers in finance have recommended abandoning financial ratio analysis and using regression analysis. But regression analysis brings its own methodological issues and is not always applicable. It also ignores the reasons for the widespread use of performance ratios, mainly their simplicity. What is needed is better use and better interpretation of performance ratios informed by an awareness of the potential problems. In addition, we need to use performance ratio analysis in a more systematic fashion which is the subject of my next post.

Ranking Teams by Performance Rather than Results: Another Perspective on International Rugby Union Rankings for 2017

Executive Summary

• Competitor ranking systems tend to be results-based
• Performance-based ranking systems are more useful for coaches by providing a diagnostic tool for investigating the relative strengths and weaknesses of their own team/athletes and opponents
• Performance-based rankings can be calculated using a structured hierarchy in which KPIs are combined into function-based factors and overall performance scores
• A performance-based ranking of international rugby union teams in 2017 suggests that the All Blacks are still significantly ahead of England mainly due to their more effective running game

Most competitor ranking systems are results-based and use either generic ranking algorithms such as the Elo ratings (first developed to rank chess players) or sport-specific algorithms often developed by the governing bodies. As well as their general interest to fans and the media, these rating systems can often be of real practical significance when used to seed competitors in tournaments. These results-based ranking systems can be very sophisticated mathematically and usually incorporate adjustments for the quality of the opponent as well as home advantage and the status of matches/tournaments. These ranking systems also tend to include results from both the current season and previous seasons, usually with declining weights so that current results are more heavily weighted. A good example of an official results-based ranking system in team sports is the World Rugby rankings.
From a coaching perspective, results-based ranking systems are of very limited value beyond providing an overall comparison of competitor quality. What coaches really need to know is why their own team/athlete and opponents are ranked more highly or not. Opposition analysis is about identifying strengths and weaknesses of opponents in order to devise a game plan that maximises the opportunities created by opponent weaknesses, and minimises the threats from opponent strengths (i.e. SWOT analysis). Opposition SWOT analysis requires a performance-based approach that brings together a set of KPIs covering the various aspects of performance. A performance-based rankings system can provide a very useful diagnostic tool that allows coaches to investigate systematically the relative strengths and weaknesses of their own team/athletes or opponents, and help inform decisions on which areas to focus in the more detailed observation-based analysis (i.e. video analysis and/or scouting).
As an example of a performance-based ranking system, I have produced a set of rankings for the 10 Tier 1 teams in international men’s rugby union (i.e. the teams comprising the Six Nations and the Rugby Championship) for 2017. These rankings are based on 36 KPIs calculated for every match involving a Tier 1 team between 1st January 2017 and 31st December 2017. In total the rankings use 118 Tier 1 team performances from 69 matches. The ranking system comprises a three-level structured hierarchy. It is a bottom-up approach which starts with 36 KPIs which are combined into five function-based factors which, in turn, are combined into an overall performance score.

Blog 18.02 Graphic (Fig 1)

There are several alternative ways of combining the KPIs into function-based factors and an overall performance score. Broadly speaking the choice is between using expert judgment or statistical methods (as I have discussed in previous posts on player rating systems). In the case of my performance rankings for international rugby union, I have used a statistical technique, factor analysis, to identify 5 factors based on the degree of correlation between the 36 KPIs. Effectively factor analysis is a method of data reduction that exploits the common information across variables (as measured by the pairwise correlations). If two KPIs are highly correlated this suggests that they are essentially providing two measures of the same information and so could be usefully combined into a single metric. Factor analysis extracts the different types of common information from the 36 KPIs and restructures this into a smaller set of independent factors. The five factors can be easily interpreted in tactical/functional terms (with the dominant KPIs indicated in parentheses):
Factor 1: Attack (metres gained, defenders beaten, line breaks, Opp 22 entry rate)
Factor 2: Defence (tackles made, tackle success rate, metres allowed)
Factor 3: Exit Play, Kicking and Errors (Own 22 exit rate, kicks in play, turnovers conceded)
Factor 4: Playing Style (carries, passes, phases per possession)
Factor 5: Discipline (penalties conceded)
The factors are calculated for every Tier 1 team performance in 2017, averaged for each Tier 1 team, adjusted for the quality of the opposition, rescaled 0 – 100 with a performance score of 50 representing the average performance level of Tier 1 teams in 2017, and normalised so that around 95% of match performances are in the 30 – 70 range. The results are reported in Table 1 with the results-based official World Rugby rankings included for comparison. (It should be noted that the official World Rugby rankings cover all the rugby-playing nations, allow for home advantage and include pre-2017 results but exclude the tests between New Zealand and the British and Irish Lions.)

Blog 18.02 Graphic (Table 1)

Despite the differences in approach between my performance rankings and the official World Rugby rankings, there is a reasonable amount of agreement. Based only on 2017 performances, the gap between New Zealand and England in terms of performances remains greater than suggested by the official rankings. Also Ireland rank above England in performance but not in the official rankings, suggesting that Ireland’s narrow win in Dublin in March to deny England consecutive Grand Slams was consistent with the relative performances of the two teams over the whole calendar year.
Of course, the advantage of the performance-based approach is that it can be used to investigate the principal sources of the performance differentials between teams. For example, England rank above New Zealand in three out of five of the factors (Factors 2, 3, 5) and only lag slightly behind in another factor (Factor 4). The performance gap between England and the All Blacks is largely centred on Factor 1, Attack, and principally reflects the much more effective running game of the All Blacks which averaged 517m gained per game in 2017 (the best Tier 1 game average) compared to a game average of 471m gained by England (which ranks only 5th best). It should also be noted that All Blacks had a significantly more demanding schedule in 2017 in terms of opposition quality with 8 out 14 of their matches against top-5 teams (with the Lions classified as a top-5 equivalent) whereas England had only 2 out of 10 matches against top-5 opponents.

 

Small is Beautiful: Big-Data Analytics and the Big-to-Small Translation Problem

Happy New Year. And apologies for the lack of posts on Winning With Analytics over the last year. Put it down to my Indiana-Jones-type existence, a university prof by day, and a sports data analyst by night. This duality of roles became even more hectic in 2017 as I returned to rugby union to work again with Brendan Venter now Technical Director at London Irish as well as assisting South Africa and Italy. I have also continued my work with AZ Alkmaar in Dutch football. To some I might seem to be a bit of a dilettante, trying to work simultaneously at an elite level in two different sports. Far from it. Much of the insights on game tactics and analytical methods are very transferable across the two sports. The last 12 months have probably been one of my most productive periods in developing my understanding of how to best use data analytics as part of an evidence-based approach to coaching. I hope to share much of my latest thinking with you over the coming months with regular posts.

Executive Summary
• Data analytics is suffering from a fixation with big-data analytics.
• Big-data analytics can be a very powerful signal-extraction tool to discover regularities in the data.
• But big-data exacerbates the big-to-small translation problem; big-data, context-generic statistical analysis must be translated into practical solutions to small-data (i.e. unique), context-specific decision problems.
• Sports analytics is most effective when the analyst understands the specific operational context of the coach, produces relevant data analysis and translates that analysis into practical recommendations.

The growth in data analytics has been closely associated with the emergence of big data. Originally “big data” referred to those really, really big databases that were so big as to create significant hardware capacity problems and required clusters of computers to work together. But these days the “big” in big data is, much like beauty, in the eye of the beholder. IBM categorise big-data analytics in terms of the four V’s – Volume (scale of data), Velocity (analysis of streaming data), Variety (different forms of data), and Veracity (uncertainty of data). The 4 V’s capture the core problems of big-data analytics – trying to analyse large datasets that are growing exponentially with data captured from multiple sources of varying quality and reliability. I always like to add a fifth V – Value. Big-data analytics must be relevant to the end-user, providing an evidential base to support to the decision-making process.

Sports analytics, just like other applications of data analytics, seems to have been bitten by the big-data bug. In my presentation last November at the 4th Annual Sportdata & Performance Forum held in Zurich, I called it the “big-data analytics fixation”. I don’t work with particularly big datasets, certainly not big in the sense of exceeding the capacity of a reasonably powerful PC or laptop. The basic XML file produced by Opta for a single football match has around 250k data points so that a database covering all matches in a football league for one season contains around 100m data points. This is pretty small compared to some of the datasets used in business analytics but sizeable enough to have totally transformed the type of data analysis I am now able to undertake. But I would argue very strongly that the basic principles of sports analytics remain unchanged irrespective of the size of the dataset with which the analyst is working.

Big-data analytics exacerbates what I call the big-to-small translation problem. Big-data analytics is a very powerful signal-extraction tool to discover regularities in the data. Big-data analytics, like all statistical modelling, attempts to decompose observed data into systematic variation (signal) and random variation (noise). The systematic variation captures the context-generic factors common to all the observations in a dataset while the random variation represents the context-specific factors unique to each individual observation. But while analytical modelling is context-generic, decisions are always unique and context-specific. So it is important to consider both the context-generic signal and the context-specific noise. This is the big-to-small translation problem. Understanding the noise can often be just as important, if not more so, as understanding the signal when making a decision in a specific context. Noise is random variation relative to the dataset as a whole but random does not necessarily mean inexplicable.

I disagree profoundly with the rather grandiose end-of-theory and end-of-statistics claims made for big-data analytics. Chris Anderson in an article on the Wired website back in 2008 claimed that the data deluge was making the scientific method obsolete. He argued that there was no longer any need for theory and models since in the world of big data correlation supersedes causation. Indeed some have argued that big-data analytics represents the end of statistics since statistics is all about trying to make inferences about a population from a sample but big data renders sampling irrelevant when we are now working with population data not small samples. But evidence-based practice always requires an understanding of causation. Recommendations that do not take into account the specific operational context and the underlying behavioural causal processes are unlikely to carry much weight with decision-makers.

There is a growing awareness in sports analytics of the big-to-small translation problem. In fact the acceptance by coaches of data analytics as an important source of evidence to complement video analysis and scouting is crucially dependent on analysts being able to translate the results of their data analysis into context-specific recommendations such as player recruitment targets, game tactics against specific opponents, or training session priorities. It was one of the themes to emerge from the presentations and discussions at the Sportdata & Performance Forum in November 2017 (yet again an excellent and very informative event organised by Edward Abankwa and his team at the Pinnacle Group). As one participant put it so well, “big data is irrelevant unless you can contextualise it”. And in a similar vein, a representative of a company supplying wearable technologies commented that their objective is “making big data personally relevant”. Sports analytics is most effective when the analyst understands the specific operational context of the coach, produces relevant data analysis that provides an appropriate evidential base to support the specific decision, and translates that analysis into practical recommendations to the coach on the best course of action.

Bridging the Gap: Improving the Coach-Analyst Relationship (Part 2)

Executive Summary

  1. Analytical results are usually presented most effectively to coaches by using data visualisation and story-telling.
  2. Don’t ignore external commercial data if it is available and affordable.
  3. Data analysts can make a vital contribution to the organisation of training sessions.
  4. Data analytics is only one input into decision making by coaches, albeit a potentially very important one if used effectively.

 

  1. Analytical results are usually presented most effectively to coaches by using data visualisation and story-telling.

As well as the imperative of translating analytical results into practical recommendations framed in the language of coaches, a number of speakers stressed the importance of data visualisation and story-telling as communication devices. “A picture is worth a thousand words” has become even truer in the age of data analytics where effective data visualisation has become a vital tool for the analyst. Rob Carroll (The Video Analyst) illustrated this very well with his graphics on the quality of shooting opportunities in Gaelic football, a form of expected goals model. Ann Bruen (Metrifit) suggested that we should always have in mind the story we are going to tell as we collect and analyse the data. Ben Mackriell from Opta, whose core business is providing performance data, made the same point when he said that it is possible to have a conversation about data without actually mentioning the data (or the analytical techniques). Of course when it comes to evidence-based story-telling we must remain open-minded and allow the precise details and ending of the story to emerge from the analysis. There is always a danger of not allowing the data to get in the way of a good story, of pre-judging the results of the data analysis; it is what cognitive psychologists call confirmation bias. A good evidence-based story is a story that conveys analytical results in the language of coaches, focusing on the practical implications with explanations of athlete and team performance framed in terms of skill technique and tactical decisions. As Edward Metgod (Royal Dutch Football Association) pointed out, coaches are interested in causality not correlation. Analysts must translate the evidence of statistical associations into credible stories of cause and effect with clear implications for targeted interventions to improve performance. When all is said and done analytics is actionable insight.

The discussion of the importance of story-telling reminded me of the advice of Alfred Marshall on the use of mathematics in economics. Marshall probably did more than anyone to systemise economics as a subject and much of his mathematics and diagrams still remain in the textbooks. Marshall was very aware of the uses and abuses of mathematics. Economics was intended to be a practical subject about the everyday business of life but Marshall became increasingly concerned that economists assumed good mathematics meant good economics. He advised that if the mathematics could not be translated into English and then illustrated with important real-life examples (i.e. a good story), then it should be burnt. Apart from the health and safety issues (perhaps safer to shred than burn), Marshall’s advice holds good for data analytics too. If it doesn’t produce actionable insight, it is worthless.

 

  1. Don’t ignore external commercial data if it is available and affordable.

Any discussion of data analytics must include a discussion of the nature of the data being used. The Forum was a great place for this type of discussion giving that it brought together external and internal data providers, data analysts and end-users. In the past there has been too much emphasis on different types of data as substitutes whereas now there is greater acceptance of the complementarity of data. And that complementarity will get even better as there is more and more cross-over in personnel between teams and commercial data providers. Ben Mackriell at Opta is a good case in point, now in charge of OptaPro but with years of experience working with teams in rugby union and football. External commercial data offers consistency and coverage whereas internal data is team-specific and often includes expert coach evaluation of skill technique and tactical decision-making relative to the game plan. The differences between these two types of data are variously described as objective vs subjective, frequency vs evaluation, general vs expert, small data vs big data. The differences were well illustrated in the Q&A that followed Edward Metgod’s presentation when he was asked how he would define the transition phase of play. Edward replied as a coach and scout with a subjective/evaluation/expert definition that the transition represents the period of play after a team loses possession but has not gained its defensive shape. Transition is determined by tactical factors in contrast to more objective definitions in terms of a specific time period (e.g. the first five seconds after possession is lost) or the number of passes made by the team gaining possession. What is important to recognise is that these different types of data have different but complementary functions. For example, external data, possibly in the form of a player rating system, can be used at the first stage of player recruitment to identify a target group of players for whom at the second stage internal data is then produced by the team’s scouts. This is exactly the system of e-screening of potential player acquisitions that I recommended to Bolton Wanderers in 2005. Increasingly I am finding that my greatest and most interesting challenge as an analyst is to generate expert insight from non-expert data particularly in opposition analysis. Can I get inside the minds of the opposition coaches by studying the patterns in their data?

 

  1. Data analysts can make a vital contribution to the organisation of training sessions.

There were a number of speakers at the Forum whose specialism lay in strength and conditioning, and sports science. In addition the Forum also included presentations from coach educators. Both of these groups shared a concern with the optimal use of training time. As a qualified coach and university professor, I want to gain a deeper understanding of the skill-acquisition process whether it be how players learn to perform in games or how data analysts learn to be effective in teams. Nick Winkelman (IRFU) was the lead-off speaker at the Forum and made some great points on both skill acquisition and the role of analytics. As both Nick and several other speakers stressed, when it comes to effective learning, “context is everything” and randomised but relevant learning opportunities provide the most effective way of acquiring and retaining new skills. Blocked repetitions of a specific skill will improve the accuracy with which a skill is performed in a training session but this does not necessarily transform into a game context when the player must not only accurately execute the skill but also make the right decision as to when to execute that particular skill. Nick argued, rightly in my mind, that too much of the data analysis linked to training is focused on workload when what is also needed is a greater input into creating the appropriate game-related contexts.

 

  1. Data analytics is only one input into decision making by coaches, albeit a potentially very important one if used effectively.

The Forum brought together a diversity of specialisms involved in high performance sport. All agreed, albeit with greater or lesser conviction, that data analytics is potentially a very important coaching tool but its effectiveness had been often limited by poor communication particularly the failure of analysts to translate analytical results into actionable insight framed in the language of coaches. I came away from the Forum feeling positive about the future of data analytics in high performance sport. Data analytics is now being seen as another tool to complement scouting, video analysis and reporting. But analysts must guard against complacency. There is still much to do in many sports and in many teams to create a thorough-going commitment to evidence-based coaching. And we will only do that by “bridging the gap” and producing actionable insight relevant to day-to-day coaching decisions.

Bridging the Gap: Improving the Coach-Analyst Relationship (Part 1)

Executive Summary

  1. The analyst must be able to translate analytical results into coaching recommendations.
  2. Data analytics can only be effective in organisations with a cultural commitment to evidence-based practice.
  3. Start simple when first introducing data analytics as a coaching tool.

 

Last week I attended the Sportdata & Performance Forum held at University College Dublin in Ireland. The Forum is in its third year having been previously held in Berlin in 2014 and 2015. The organiser, Edward Abankwa and his colleagues are to be congratulated on yet again putting together an interesting and varied programme with a good mix of speakers. Frequently European sports conferences are dominated by (association) football but this gathering was again pretty diverse with Olympic sports, rugby union, rugby league and the Gaelic sports all well represented. And crucially the Forum is not a purely sports analytics event but draws speakers and delegates involved in all aspects of sports performance – coaches, coach educators, performance analysts, data analysts, sports scientists, academics, consultants and commercial data providers. I presented an overview on developments in spatial analytics which I will discuss in a later post. In this post (split into two parts) I want to draw together the various contributions around the theme of how to make data analytics more effective in elite sports.

 

  1. The analyst must be able to translate analytical results into coaching recommendations.

A recurring theme throughout the Forum was that the impact of data analytics in elite sports is often limited by a language problem. Brian Cunniffe (English Institute of Sport) talked about the need to bridge the language gap between the coach and the analyst/scientist. So often analysts and coaches do not speak the same language. Analysts see the world as a modelling problem formulated in the language of statistics and other data analytical techniques. Coaches see the world as a performance problem formulated in the language of skill technique and tactics. My very strong view is that it is solely the analyst’s responsibility to resolve the language problem. Analytics always starts and ends with the coaches. Coaches have to make a myriad of coaching decisions. Analysts are trying to provide an evidential base to support these coaching decisions. The analysts must start by trying to understand the coaching decision problem and then translate that into a modelling problem to be analysed. The analyst must then translate the analytical results into a practical action-focussed recommendations framed in the language of coaching not the language of analytics. Denise Martin, a performance analyst consultant with massive experience in a number of sports in Ireland, summed it up very succinctly when she said that the task of the analyst is to “make the abstract tangible”. To do this the analyst must spend time with the coaches, learning how coaches see the world in just the same way as performance analysts do in order to produce effective video analysis.

 

Martin Rumo (Swiss Federal Institute of Sports) provided a great example of the coaching-analytics process working effectively. He described his experience collaborating with a football coach who wanted to evaluate how well his players were putting pressure on the ball. In order to build an algorithm to measure the degree of pressure on the ball Martin started by having a conversation with the coach to identify the key characteristics of situations in which the coach considered there was pressure on the ball. This conversation provided the bridge from the coaching problem to the modelling problem and increased the likelihood that the analytical results would have practical relevance to the coach.

 

One of the most interesting speakers at the Forum was Edward Metgod, the former Dutch goalkeeper and now a scout and analyst with the Dutch national team. Edward has a playing and coaching background, a deep commitment to self-improvement and an open mind to using the best available tools to do his job effectively. He is precisely the type of football person with whom a data analyst would want to work. Edward started his talk recounting how he had read a number of books on data analytics which he had found interesting but when he came to books on football analytics he was quickly turned off. The problem with the football analytics books is the language (although I also sensed that he had found nothing new in these books to advance his knowledge on football in any practical way). Edward then detailed that in Dutch football there is a common coaching language which breaks the game down into four moments – defensive transition, offensive transition, ball possession, and opponent ball possession. All of Edward’s reports are structured around these four moments. The clear implication for any data analyst, like myself, working in Dutch football is that you must learn this coaching language if you want to communicate effectively with coaches. I should add that I have subscribed to the four-moments perspective for several years and apply it as a way of structuring my analysis in any invasion-territorial team sport.

 

  1. Data analytics can only be effective in organisations with a cultural commitment to evidence-based practice.

The importance of having the right organisational culture to support data analytics was stressed by many of the speakers. Rob Carroll (The Video Analyst) defined culture very neatly as what a team does every day. A common characteristic of every sports organisation with which I have worked and in which data analytics has a real impact is a cultural commitment to creating an evidential base for their decisions. And that cultural commitment is led from the top by the performance director and head coach with buy-in from all of the coaching staff. As I have discussed in a previous post, Saracens epitomise an elite team in which data analytics has become part of how they do things day to day, and that culture has been built over a number of years led by their directors of rugby, initially Brendan Venter and then his successor, Mark McCall. Many European sports organisations still have a long way to go to in their analytical development and some remain staunchly “knowledge-allergic”. Analysts themselves have been part of the problem by not learning the language needed to communicate with coaches. But the organisations bear much of the responsibility for the lack of progress compared to many leading teams in the North American major leagues which have used evidence-based practice to gain a competitive advantage with the 2016 World Series champions, the Chicago Cubs, just the latest case study of how to do evidence-based practice effectively. Too often teams have appointed analysts without any real strategic purpose other than it seemed the right thing to do and what other teams were doing. Data analytics must be seen as a strategic choice by the sporting leadership of the team, a point made eloquently by as Daniel Stenz who has extensive experience in applying analytics in football in Germany, Hungary and Canada. It can also require buy-in from the team ownership particularly since, as Denise Martin explained, evidence-based practice thrives in a culture that emphasises the process not the outcome. But of course an emphasis on process requires that the team ownership adopts a long-term perspective on their sporting investment which is always difficult in sports organised as merit hierarchies with promotion and relegation (and play-offs and European qualification). When the financial risk is so dependent on sporting results the team ownership inevitably tends to become increasingly short term in judging performance so that quick-fix solutions such as signing new players or firing the head coach prevail. Analytics is unlikely ever to be a quick fix.

 

  1. Start simple when first introducing data analytics as a coaching tool.

Another common message at the Forum for teams starting out on the use of data analytics is to start simple, a point made by Denise Martin and Ann Bruen (Metrifit) amongst others. Analysts are often guilty of putting more emphasis on the sophistication of their techniques rather than the practical relevance of their results. Analytics must always be decision-driven. Providing some simple useful input into a specific coaching decision will help build credibility, respect and coach buy-in, all vital ingredients to the successful evolution of an analytical capability in a team. Complexity can come later. As Ann reminded us, avoid the TMI/NEK problem of “too much information, not enough knowledge”. Elite teams are drowning in data these days and every day it gets worse. Just try to imagine how much data on physical performance of athletes in a single training session can be produced with wearable technology. The function of an analyst is to solve the data overload problem. Analysts are in the business of reducing (i.e. simplifying) a complex and chaotic mass of data into codified patterns of variation with practical importance. Start simple, and always finish simple.

A Simple Approach to Player Ratings

Executive Summary

  • The principal advantage of a statistical approach to player ratings is to ensure that information on performance is used in a consistent way.
  • However there are numerous difficulties in using statistical techniques such as regression analysis to estimate the weightings to construct an algorithm for combining performance metrics into a single player rating.
  • But research in decision science shows that there is little or no gain in using sophisticated statistical techniques to estimate weightings. Using equal weights works just as well in most cases.
  • I recommend a simple approach to player ratings in which performance metrics are standardised using Z-scores and then added together (or subtracted in the case of negative contributions) to yield a player rating that can then be rescaled for presentational purposes.

 

The basic analytical problem in contributions-based player ratings, particularly in the invasion-territorial team sports, is how to reduce a multivariate set of performance metrics to a single composite index. A purely statistical approach combines the performance metrics using weightings derived from a team-level win-contributions model of the relationship between the performance metrics and match outcomes, with these weightings usually estimated by regression analysis. But, as I have discussed in previous posts, numerous estimation problems arise with win-contributions models so much so that I seriously question whether or not a purely statistical approach to player ratings is viable. Those who have tried to produce player ratings based on win-contributions models in the invasion-territorial team sports have usually ended up adopting a “mixed-methods” approach in which expert judgment plays a significant role in determining how the performance metrics are combined. The resulting player ratings may be more credible but can lack transparency and so have little practical value for decision makers.

 

Decision science can provide some useful insights to help resolve these problems. In particular there is a large body of research on the relative merits of expert judgment and statistical analysis as the basis for decisions in complex (i.e. multivariate) contexts. The research goes back at least to Paul Meehl’s book, Clinical versus Statistical Predictions, published in 1954. Meehl subsequently described it as “my disturbing little book” in which he reviewed 20 studies in a wide range of areas, not just clinical settings, and found that statistical analysis in all cases provided at least as good predictions, and in most cases, more accurate predictions. More than 30 years later Dawes reviewed the research instigated by Meehl’s findings and concluded that “the finding that linear combination is superior to global judgment is strong; it has been replicated in diverse contexts, and no exception has been discovered”. More recently, the Nobel Prize laureate, Daniel Kahneman, in his best-selling book, Thinking: Fast and Slow, surveyed around 200 studies and found that 60% showed statistically-based algorithms produced more accurate predictions with the rest of the studies showing algorithms to be as good as experts. There is a remarkable consistency in these research findings unparalleled elsewhere in the social sciences yet the results have been ignored for the most part so that in practice confidence in the superiority of expert judgment remains largely undiminished.

 

What does this tell us about decision making? Decisions always involve prediction about uncertain future outcomes since we choose a course of action with no certainty over what will actually happen. We know the past but decide the future. We try to recruit players to improve future team performance using information on the player’s current and past performance levels. What decision science has found is that experts are very knowledgeable on the factors that will influence future outcomes but experts, like the rest of us, are no better and indeed are often worse, when it comes to making consistent comparisons between alternatives in a multivariate setting. Decision science shows that human beings tend to be very inconsistent, focusing attention on a small number of specific aspects of one alternative but then often focusing on different specific aspects of another alternative, and so on. Paradoxically experts are particularly prone to inconsistency in the comparison of alternatives because of their depth of knowledge of each alternative. Statistically-based algorithms guarantee consistency. All alternatives are compared used the same metrics and the same weightings. The implication for player ratings is very clear. Use the expert judgment of coaches and scouts to identify the key performance metrics but rely on statistical analysis to construct an algorithm (i.e. a player rating system) to produce consistent comparisons between players.

 

So far so good but this still does not resolve the statistical estimation problems involved in using regression analysis to determine the weightings to be used. However decision science offers an important insight in this respect as well. Back in the 1970s Dawes undertook a comparison of the predictive accuracy of proper and improper linear models. By a proper linear model he meant a model in which the weights were estimated using statistical methods such as multiple regression. In contrast improper linear models use weightings determined non-statistically such as equal-weights models where it is just assumed that every factor has the same importance. Dawes traces the equal-weights approach back to Benjamin Franklin who adopted a very simple method for deciding between different courses of action. Franklin’s “prudential algebra” was simply to count up the number of reasons for a particular course of action and subtract the number of reasons against, then choose that course of action with the highest net score. It is a very simple but consistent and transparent with a crucial role for expert judgment in identifying the reasons for and against a particular course of action. Using 20,000 simulations, Dawes found that equal weightings performed better than statistically-based weightings (and even randomly generated weightings worked almost as well). The conclusion is that it is consistency that really matters, more so than the particular set of weightings used. And as well as ensuring consistency, an equal-weights approach avoids all statistical estimation problem. Equal weights are also more likely to provide a method of general application that avoids the problem of overfitting i.e. weightings that are very specific to the sample and model formulation.

 

Applying these insights from decision science to the construction of player rating systems provides the justification for what I call a simple approach to player ratings. There are five steps:

  1. Identify an appropriate set of performance metrics involving the expert judgment of GMs, sporting directors, coaches and scouts
  2. Standardise the performance metrics to ensure a common measurement scale – my suggested standardisation is to calculate Z-scores
  3. Z-scores have been very widely used to standardise performance metrics with very different scales of measurement e.g. Z-scores have been used in golf to convert very different types of metrics such as driving distance (yards), accuracy (%) and number of putts into comparable measures that could be added together.
  4. Allocate weights of +1 to positive contributions and -1 to negative contributions (i.e. Franklin’s prudential algebra)
  5. Calculate the total Z-score for every player
  6. Rescale the total Z-score to make them easier to read and interpret. I usually advise avoiding negative ratings and reducing the dependency on decimal places to differentiate players.

 

I have applied the simple approach to produce player ratings for 535 outfield players in the English Championship covering the first 22 rounds of games in season 2015/16. I have used player totals for 16 metrics: goals scored, shots at goal, successful passes, unsuccessful passes, successful dribbles, unsuccessful dribbles, successful open-play crosses, unsuccessful open-play crosses, duels won, duels lost, blocks, interceptions, clearances, fouls conceded, yellow cards and red cards. The total Z-score for every player has been rescaled to yield a mean rating of 100 (and a range 5.1 – 234.2). Below I have reported the top 20 players.

 

Player Team Player Rating
Shackell, Jason Derby County 234.2
Flint, Aden Bristol City 197.9
Keogh, Richard Derby County 196.0
Keane, Michael Burnley 195.7
Morrison, Sean Cardiff City 193.8
Duffy, Shane Blackburn Rovers 191.1
Davies, Curtis Hull City 184.3
Onuoha, Nedum Queens Park Rangers 183.2
Morrison, Michael Birmingham City 179.1
Duff, Michael Burnley 175.6
Hanley, Grant Blackburn Rovers 175.2
Tarkowski, James Brentford 171.1
McShane, Paul Reading 169.8
Collins, Danny Rotherham United 168.3
Stephens, Dale Brighton and Hove Albion 167.4
Lees, Tom Sheffield Wednesday 166.0
Judge, Alan Brentford 164.4
Blackman, Nick Reading 161.9
Bamba, Sol Leeds United 160.1
Dawson, Michael Hull City 159.7

 

I hasten to add that these player ratings are not intended to be definitive. As always they are a starting point for an evaluation of the relative merits of players and should always be considered alongside a detailed breakdown of the player rating into the component metrics to identify the specific strengths and weaknesses of individual players. They should also be categorised by playing position and playing time but those are discussions for future posts.

 

 

Some Key Readings in Decision Science

Meehl, P., Clinical versus Statistical Predictions: A Theoretical Analysis and Revision of the Literature, Minneapolis: University of Minnesota Press, 1954.

Dawes, R. M. ‘The robust beauty of improper linear models in decision making’, American Psychologist, vol. 34 (1979), pp. 571– 582.

Dawes, R. M., Rational Choice in an Uncertain World, San Diego: Harcourt Brace Jovanovich, 1988.

Kahneman, D., Thinking, Fast and Slow, London: Penguin Books, 2012.

 

More on the Problems of Win-Contribution Player Rating Systems and a Possible Mixed-Methods Solution

Executive Summary

  • There are three main problems with the win-contribution approach to player ratings: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings; and (iii) measuring contribution importance as statistical predictive power.
  • A possible solution to these problems is to adopt a mixed-methods approach combining statistical analysis and expert judgment.
  • The EA Sports Player Performance Index and my own STARS player rating system are both examples of the mixed-methods approach.
  • Decision makers require credible data analytics but credibility does not depend solely on producing results that look right. Some of the most important results look wrong by defying conventional wisdom.
  • A credible player ratings system for use by decision makers within teams requires that differences in player ratings are explicable simply but precisely as specific differences in player performance.

 

In my previous post I discussed some of the problems of adopting a win-contribution approach to player ratings in the invasion-territorial team sports. Broadly speaking, there are three main issues: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings used to combine the different skill-activities into a single player rating; and (iii) measuring contribution importance as statistical predictive power. The first issue arises because win-contribution models in the invasion-territorial team sports are essentially multivariate models to be estimated using, for example, linear regression methods. Estimation problems abound with these types of models as exemplified by the regression results reported in my previous post for the Football League Championship 2015/16 which included “wrong” signs, statistically insignificant estimates, excessive weightings for actions mostly closely connected with goals scored/conceded (i.e. shots and saves), and low goodness of fit. These problems can often be resolved by restructuring the win-contribution model. In particular, a multilevel model can take account of the sequential nature of different contributions. Also it often works better to combine attacking and defensive contributions in a single model by treating goals scored (and/or shots at goal) as the outcome of own-team attacking play and opposition defensive play.

 

The second issue with the win-contribution approach is that the search for a better statistical model to avoid the various estimation problems may yield estimated contributions for the different skill-activities which may not be generalizable beyond the specific sample used and the specific model estimated. The estimated weightings derived from regression models can be very unstable and sensitive to what other skill-activities are included. This instability problem occurs when there is a high degree of correlation between some skill-activities (i.e. multicollinearity). The generalizability of the estimated weightings will be improved by using larger samples that include multiple seasons and multiple leagues.

 

The final issue with win-contribution models estimated using statistical methods such as regression analysis is that the weightings reflect statistical predictive power. But is the value of a skill-activity as a statistical predictor of match outcomes the appropriate definition of the value of the win-contribution of that skill-activity? I do not think that we give enough explicit attention to this issue. Too often we only consider it indirectly when, for example, we try to resolve the problem of certain skill-activities having excessive weightings because of the sequential nature of game processes. Actions near the end of a sequence tend naturally to have much greater predictive power for the final outcome. Typically shots at goal is the best single predictor of goals scored while the goalkeeper’s save-shot ratio is the best single predictor of goals conceded. Using multilevel models is, when all is said and done, just an attempt to reduce the predictive power of these close-to-outcome skill-activities. The issue is of particular importance in low-scoring, more unpredictable team sports such as (association) football.

 

All of these issues with win-contribution models raise severe doubts about the usefulness of relying on a purely statistical approach such a linear regression both to identify the relevant skill-activities to be included in the player rating system, and to determine the appropriate weighting system to combine the selected skill-activities. As a result some player rating systems have tended to adopt a more “mixed-methods” approach combining statistical analysis and expert judgment. One example of this approach is my own STARS player rating system that I developed around 12 years ago, initially applied to the English Premiership and then subsequently recalibrated for the MLS. The STARS player (and team) ratings were central to the work I did for Billy Beane and the Oakland A’s ownership group on investigating the scope for data analytics in football. The STARS player rating system is summarised in the graphic below.

Blog 12 Graphic.png

Regression analysis was used to estimate a multilevel model which provided the basic weightings for the skill-activities within the five identified groupings. Expert judgment was used to decide which skill-activities to include, the functional form for the metrics, and the weightings used to combine the five groupings. Essentially this weighting scheme was based on a 4-4-2 formation with attack and defence groupings each weighted as 4/11, striking as 2/11, and goalkeeping as 1/11. (Negative contributions were reassigned to the attack and defence groupings.) Expert judgment was also used to determine the weightings of some skill-activities for which regression analysis proved unable to provide reliable estimates.

 

A very detailed account of the problems of constructing a win-contribution player rating system in football using regression analysis is provided by Ian McHale who developed the EA Sports Player Performance Index (formerly the Actim Index) for the English Premiership and Championship (see I. G. McHale, P. A. Scarf and D. E. Folker, Interfaces, July-August 2012). McHale’s experience is also discussed in David Sumpter’s recently published book, Soccermatics: Mathematical Adventures in the Beautiful Game (Bloomsbury Sigma, 2016), a must-read for all of us with an interest in applying mathematics and statistics to football. At the core of the EA Sports player rating system is a match-contribution model in which regression analysis is used to estimate a model of shots as a function of crosses, dribbles and passes as well as opposition defensive actions (interceptions, clearances and tackle-win ratio) and opposition discipline (yellow cards and red cards). The estimated model of shots is combined with shot effectiveness and then rescaled in terms of league points. In their 2012 article McHale and his co-authors report the top 20 Premiership players for season 2008/09 based on the match-contribution model and show that the list is dominated by goalkeepers (7) and defenders (11) with Fulham’s goalkeeper, Mark Schwarzer, topping the list. Only the Aston Villa midfielder, Gareth Barry (ranked 2nd), and the Chelsea striker, Nicolas Anelka (ranked 10th), break the goalkeeper-defender domination of the top ratings.

 

McHale deals with the problems of a purely statistical approach by adopting what I am calling a mixed-methods approach combining statistical analysis and expert judgement. The final version of the EA Sports Player Performance Index consists of a weighted combination of six separate indices. The match-contribution model has a weighting of only 25%. There are two indices based on minutes played which have a combined weighting of 50% with most of that weighting (37.5%) allocated to the point-sharing index which takes into account the final league points of the player’s team thereby increasing the rating of players playing for more successful teams. The other indices capture goal-scoring, assists and clean sheets and have a combined weighting of 25%. All the indices are measured in terms of league points. For comparison McHale reports the top 20 Premiership players for 2008/09 using the final index and finds that the list is now much more evenly distributed across playing distributions with Anelka now topping the list and Schwarzer ranked only 17th.

 

McHale’s mixed-methods approach is a great example of the problems faced by win-contribution player rating systems and how statistical analysis and expert judgment need to be combined to produce a credible player rating system. Credibility is absolutely fundamental to data analytics. Decision makers will ignore evidence that does not appear credible and the use of sophisticated statistical techniques does not confer credibility to the analysis, often quite the opposite. McHale recognises that a purely statistical approach using predictive power to weight different skill-activities does not provide credible player ratings and, in consultation with his clients, introduces other performance metrics using expert judgment not statistical estimation.

 

I have one further concern over the credibility of player rating systems and that is the importance of transparency when the player ratings are to be used as an input for coaching, recruitment and remuneration decisions. This is not really an issue for McHale since the EA Sports Player Performance Index is primarily directed at the media and fans (although interestingly McHale shows that there was a very close match-up between a hypothetical England team based on the player ratings and England’s starting line-up in their first game in the 2010 World Cup Finals). McHale achieves credibility through a rigorous development process that produces ratings that “look right” to his clients and to the knowledgeable fan. But such a system of player ratings would have limited value for coaches because of the lack of immediate transparency. For example, it is not immediately clear how much of the difference between the ratings of two players is due to differences in their players’ own contributions and how much is due to differences between the league performances of their respective teams. Credibility for decision makers is not just about results that “look right”. At times the data analyst will throw up surprising results which “look wrong” by defying conventional wisdom but such surprises, if they can be substantiated, may provide a real source of competitive advantage. In such cases the rigour of the analysis is unlikely to be enough. The results will need to be transparent in the sense of being explicable to the decision maker in practical terms. A credible player ratings system for use by GMs, sporting directors and coaches requires that differences in player ratings are explicable simply but precisely as specific differences in player performance. My next post will set out a simple approach to constructing player rating systems to support coaching, recruitment and remuneration decisions.