Bridging the Gap: Improving the Coach-Analyst Relationship (Part 1)

Executive Summary

  1. The analyst must be able to translate analytical results into coaching recommendations.
  2. Data analytics can only be effective in organisations with a cultural commitment to evidence-based practice.
  3. Start simple when first introducing data analytics as a coaching tool.

 

Last week I attended the Sportdata & Performance Forum held at University College Dublin in Ireland. The Forum is in its third year having been previously held in Berlin in 2014 and 2015. The organiser, Edward Abankwa and his colleagues are to be congratulated on yet again putting together an interesting and varied programme with a good mix of speakers. Frequently European sports conferences are dominated by (association) football but this gathering was again pretty diverse with Olympic sports, rugby union, rugby league and the Gaelic sports all well represented. And crucially the Forum is not a purely sports analytics event but draws speakers and delegates involved in all aspects of sports performance – coaches, coach educators, performance analysts, data analysts, sports scientists, academics, consultants and commercial data providers. I presented an overview on developments in spatial analytics which I will discuss in a later post. In this post (split into two parts) I want to draw together the various contributions around the theme of how to make data analytics more effective in elite sports.

 

  1. The analyst must be able to translate analytical results into coaching recommendations.

A recurring theme throughout the Forum was that the impact of data analytics in elite sports is often limited by a language problem. Brian Cunniffe (English Institute of Sport) talked about the need to bridge the language gap between the coach and the analyst/scientist. So often analysts and coaches do not speak the same language. Analysts see the world as a modelling problem formulated in the language of statistics and other data analytical techniques. Coaches see the world as a performance problem formulated in the language of skill technique and tactics. My very strong view is that it is solely the analyst’s responsibility to resolve the language problem. Analytics always starts and ends with the coaches. Coaches have to make a myriad of coaching decisions. Analysts are trying to provide an evidential base to support these coaching decisions. The analysts must start by trying to understand the coaching decision problem and then translate that into a modelling problem to be analysed. The analyst must then translate the analytical results into a practical action-focussed recommendations framed in the language of coaching not the language of analytics. Denise Martin, a performance analyst consultant with massive experience in a number of sports in Ireland, summed it up very succinctly when she said that the task of the analyst is to “make the abstract tangible”. To do this the analyst must spend time with the coaches, learning how coaches see the world in just the same way as performance analysts do in order to produce effective video analysis.

 

Martin Rumo (Swiss Federal Institute of Sports) provided a great example of the coaching-analytics process working effectively. He described his experience collaborating with a football coach who wanted to evaluate how well his players were putting pressure on the ball. In order to build an algorithm to measure the degree of pressure on the ball Martin started by having a conversation with the coach to identify the key characteristics of situations in which the coach considered there was pressure on the ball. This conversation provided the bridge from the coaching problem to the modelling problem and increased the likelihood that the analytical results would have practical relevance to the coach.

 

One of the most interesting speakers at the Forum was Edward Metgod, the former Dutch goalkeeper and now a scout and analyst with the Dutch national team. Edward has a playing and coaching background, a deep commitment to self-improvement and an open mind to using the best available tools to do his job effectively. He is precisely the type of football person with whom a data analyst would want to work. Edward started his talk recounting how he had read a number of books on data analytics which he had found interesting but when he came to books on football analytics he was quickly turned off. The problem with the football analytics books is the language (although I also sensed that he had found nothing new in these books to advance his knowledge on football in any practical way). Edward then detailed that in Dutch football there is a common coaching language which breaks the game down into four moments – defensive transition, offensive transition, ball possession, and opponent ball possession. All of Edward’s reports are structured around these four moments. The clear implication for any data analyst, like myself, working in Dutch football is that you must learn this coaching language if you want to communicate effectively with coaches. I should add that I have subscribed to the four-moments perspective for several years and apply it as a way of structuring my analysis in any invasion-territorial team sport.

 

  1. Data analytics can only be effective in organisations with a cultural commitment to evidence-based practice.

The importance of having the right organisational culture to support data analytics was stressed by many of the speakers. Rob Carroll (The Video Analyst) defined culture very neatly as what a team does every day. A common characteristic of every sports organisation with which I have worked and in which data analytics has a real impact is a cultural commitment to creating an evidential base for their decisions. And that cultural commitment is led from the top by the performance director and head coach with buy-in from all of the coaching staff. As I have discussed in a previous post, Saracens epitomise an elite team in which data analytics has become part of how they do things day to day, and that culture has been built over a number of years led by their directors of rugby, initially Brendan Venter and then his successor, Mark McCall. Many European sports organisations still have a long way to go to in their analytical development and some remain staunchly “knowledge-allergic”. Analysts themselves have been part of the problem by not learning the language needed to communicate with coaches. But the organisations bear much of the responsibility for the lack of progress compared to many leading teams in the North American major leagues which have used evidence-based practice to gain a competitive advantage with the 2016 World Series champions, the Chicago Cubs, just the latest case study of how to do evidence-based practice effectively. Too often teams have appointed analysts without any real strategic purpose other than it seemed the right thing to do and what other teams were doing. Data analytics must be seen as a strategic choice by the sporting leadership of the team, a point made eloquently by as Daniel Stenz who has extensive experience in applying analytics in football in Germany, Hungary and Canada. It can also require buy-in from the team ownership particularly since, as Denise Martin explained, evidence-based practice thrives in a culture that emphasises the process not the outcome. But of course an emphasis on process requires that the team ownership adopts a long-term perspective on their sporting investment which is always difficult in sports organised as merit hierarchies with promotion and relegation (and play-offs and European qualification). When the financial risk is so dependent on sporting results the team ownership inevitably tends to become increasingly short term in judging performance so that quick-fix solutions such as signing new players or firing the head coach prevail. Analytics is unlikely ever to be a quick fix.

 

  1. Start simple when first introducing data analytics as a coaching tool.

Another common message at the Forum for teams starting out on the use of data analytics is to start simple, a point made by Denise Martin and Ann Bruen (Metrifit) amongst others. Analysts are often guilty of putting more emphasis on the sophistication of their techniques rather than the practical relevance of their results. Analytics must always be decision-driven. Providing some simple useful input into a specific coaching decision will help build credibility, respect and coach buy-in, all vital ingredients to the successful evolution of an analytical capability in a team. Complexity can come later. As Ann reminded us, avoid the TMI/NEK problem of “too much information, not enough knowledge”. Elite teams are drowning in data these days and every day it gets worse. Just try to imagine how much data on physical performance of athletes in a single training session can be produced with wearable technology. The function of an analyst is to solve the data overload problem. Analysts are in the business of reducing (i.e. simplifying) a complex and chaotic mass of data into codified patterns of variation with practical importance. Start simple, and always finish simple.

A Simple Approach to Player Ratings

Executive Summary

  • The principal advantage of a statistical approach to player ratings is to ensure that information on performance is used in a consistent way.
  • However there are numerous difficulties in using statistical techniques such as regression analysis to estimate the weightings to construct an algorithm for combining performance metrics into a single player rating.
  • But research in decision science shows that there is little or no gain in using sophisticated statistical techniques to estimate weightings. Using equal weights works just as well in most cases.
  • I recommend a simple approach to player ratings in which performance metrics are standardised using Z-scores and then added together (or subtracted in the case of negative contributions) to yield a player rating that can then be rescaled for presentational purposes.

 

The basic analytical problem in contributions-based player ratings, particularly in the invasion-territorial team sports, is how to reduce a multivariate set of performance metrics to a single composite index. A purely statistical approach combines the performance metrics using weightings derived from a team-level win-contributions model of the relationship between the performance metrics and match outcomes, with these weightings usually estimated by regression analysis. But, as I have discussed in previous posts, numerous estimation problems arise with win-contributions models so much so that I seriously question whether or not a purely statistical approach to player ratings is viable. Those who have tried to produce player ratings based on win-contributions models in the invasion-territorial team sports have usually ended up adopting a “mixed-methods” approach in which expert judgment plays a significant role in determining how the performance metrics are combined. The resulting player ratings may be more credible but can lack transparency and so have little practical value for decision makers.

 

Decision science can provide some useful insights to help resolve these problems. In particular there is a large body of research on the relative merits of expert judgment and statistical analysis as the basis for decisions in complex (i.e. multivariate) contexts. The research goes back at least to Paul Meehl’s book, Clinical versus Statistical Predictions, published in 1954. Meehl subsequently described it as “my disturbing little book” in which he reviewed 20 studies in a wide range of areas, not just clinical settings, and found that statistical analysis in all cases provided at least as good predictions, and in most cases, more accurate predictions. More than 30 years later Dawes reviewed the research instigated by Meehl’s findings and concluded that “the finding that linear combination is superior to global judgment is strong; it has been replicated in diverse contexts, and no exception has been discovered”. More recently, the Nobel Prize laureate, Daniel Kahneman, in his best-selling book, Thinking: Fast and Slow, surveyed around 200 studies and found that 60% showed statistically-based algorithms produced more accurate predictions with the rest of the studies showing algorithms to be as good as experts. There is a remarkable consistency in these research findings unparalleled elsewhere in the social sciences yet the results have been ignored for the most part so that in practice confidence in the superiority of expert judgment remains largely undiminished.

 

What does this tell us about decision making? Decisions always involve prediction about uncertain future outcomes since we choose a course of action with no certainty over what will actually happen. We know the past but decide the future. We try to recruit players to improve future team performance using information on the player’s current and past performance levels. What decision science has found is that experts are very knowledgeable on the factors that will influence future outcomes but experts, like the rest of us, are no better and indeed are often worse, when it comes to making consistent comparisons between alternatives in a multivariate setting. Decision science shows that human beings tend to be very inconsistent, focusing attention on a small number of specific aspects of one alternative but then often focusing on different specific aspects of another alternative, and so on. Paradoxically experts are particularly prone to inconsistency in the comparison of alternatives because of their depth of knowledge of each alternative. Statistically-based algorithms guarantee consistency. All alternatives are compared used the same metrics and the same weightings. The implication for player ratings is very clear. Use the expert judgment of coaches and scouts to identify the key performance metrics but rely on statistical analysis to construct an algorithm (i.e. a player rating system) to produce consistent comparisons between players.

 

So far so good but this still does not resolve the statistical estimation problems involved in using regression analysis to determine the weightings to be used. However decision science offers an important insight in this respect as well. Back in the 1970s Dawes undertook a comparison of the predictive accuracy of proper and improper linear models. By a proper linear model he meant a model in which the weights were estimated using statistical methods such as multiple regression. In contrast improper linear models use weightings determined non-statistically such as equal-weights models where it is just assumed that every factor has the same importance. Dawes traces the equal-weights approach back to Benjamin Franklin who adopted a very simple method for deciding between different courses of action. Franklin’s “prudential algebra” was simply to count up the number of reasons for a particular course of action and subtract the number of reasons against, then choose that course of action with the highest net score. It is a very simple but consistent and transparent with a crucial role for expert judgment in identifying the reasons for and against a particular course of action. Using 20,000 simulations, Dawes found that equal weightings performed better than statistically-based weightings (and even randomly generated weightings worked almost as well). The conclusion is that it is consistency that really matters, more so than the particular set of weightings used. And as well as ensuring consistency, an equal-weights approach avoids all statistical estimation problem. Equal weights are also more likely to provide a method of general application that avoids the problem of overfitting i.e. weightings that are very specific to the sample and model formulation.

 

Applying these insights from decision science to the construction of player rating systems provides the justification for what I call a simple approach to player ratings. There are five steps:

  1. Identify an appropriate set of performance metrics involving the expert judgment of GMs, sporting directors, coaches and scouts
  2. Standardise the performance metrics to ensure a common measurement scale – my suggested standardisation is to calculate Z-scores
  3. Z-scores have been very widely used to standardise performance metrics with very different scales of measurement e.g. Z-scores have been used in golf to convert very different types of metrics such as driving distance (yards), accuracy (%) and number of putts into comparable measures that could be added together.
  4. Allocate weights of +1 to positive contributions and -1 to negative contributions (i.e. Franklin’s prudential algebra)
  5. Calculate the total Z-score for every player
  6. Rescale the total Z-score to make them easier to read and interpret. I usually advise avoiding negative ratings and reducing the dependency on decimal places to differentiate players.

 

I have applied the simple approach to produce player ratings for 535 outfield players in the English Championship covering the first 22 rounds of games in season 2015/16. I have used player totals for 16 metrics: goals scored, shots at goal, successful passes, unsuccessful passes, successful dribbles, unsuccessful dribbles, successful open-play crosses, unsuccessful open-play crosses, duels won, duels lost, blocks, interceptions, clearances, fouls conceded, yellow cards and red cards. The total Z-score for every player has been rescaled to yield a mean rating of 100 (and a range 5.1 – 234.2). Below I have reported the top 20 players.

 

Player Team Player Rating
Shackell, Jason Derby County 234.2
Flint, Aden Bristol City 197.9
Keogh, Richard Derby County 196.0
Keane, Michael Burnley 195.7
Morrison, Sean Cardiff City 193.8
Duffy, Shane Blackburn Rovers 191.1
Davies, Curtis Hull City 184.3
Onuoha, Nedum Queens Park Rangers 183.2
Morrison, Michael Birmingham City 179.1
Duff, Michael Burnley 175.6
Hanley, Grant Blackburn Rovers 175.2
Tarkowski, James Brentford 171.1
McShane, Paul Reading 169.8
Collins, Danny Rotherham United 168.3
Stephens, Dale Brighton and Hove Albion 167.4
Lees, Tom Sheffield Wednesday 166.0
Judge, Alan Brentford 164.4
Blackman, Nick Reading 161.9
Bamba, Sol Leeds United 160.1
Dawson, Michael Hull City 159.7

 

I hasten to add that these player ratings are not intended to be definitive. As always they are a starting point for an evaluation of the relative merits of players and should always be considered alongside a detailed breakdown of the player rating into the component metrics to identify the specific strengths and weaknesses of individual players. They should also be categorised by playing position and playing time but those are discussions for future posts.

 

 

Some Key Readings in Decision Science

Meehl, P., Clinical versus Statistical Predictions: A Theoretical Analysis and Revision of the Literature, Minneapolis: University of Minnesota Press, 1954.

Dawes, R. M. ‘The robust beauty of improper linear models in decision making’, American Psychologist, vol. 34 (1979), pp. 571– 582.

Dawes, R. M., Rational Choice in an Uncertain World, San Diego: Harcourt Brace Jovanovich, 1988.

Kahneman, D., Thinking, Fast and Slow, London: Penguin Books, 2012.

 

More on the Problems of Win-Contribution Player Rating Systems and a Possible Mixed-Methods Solution

Executive Summary

  • There are three main problems with the win-contribution approach to player ratings: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings; and (iii) measuring contribution importance as statistical predictive power.
  • A possible solution to these problems is to adopt a mixed-methods approach combining statistical analysis and expert judgment.
  • The EA Sports Player Performance Index and my own STARS player rating system are both examples of the mixed-methods approach.
  • Decision makers require credible data analytics but credibility does not depend solely on producing results that look right. Some of the most important results look wrong by defying conventional wisdom.
  • A credible player ratings system for use by decision makers within teams requires that differences in player ratings are explicable simply but precisely as specific differences in player performance.

 

In my previous post I discussed some of the problems of adopting a win-contribution approach to player ratings in the invasion-territorial team sports. Broadly speaking, there are three main issues: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings used to combine the different skill-activities into a single player rating; and (iii) measuring contribution importance as statistical predictive power. The first issue arises because win-contribution models in the invasion-territorial team sports are essentially multivariate models to be estimated using, for example, linear regression methods. Estimation problems abound with these types of models as exemplified by the regression results reported in my previous post for the Football League Championship 2015/16 which included “wrong” signs, statistically insignificant estimates, excessive weightings for actions mostly closely connected with goals scored/conceded (i.e. shots and saves), and low goodness of fit. These problems can often be resolved by restructuring the win-contribution model. In particular, a multilevel model can take account of the sequential nature of different contributions. Also it often works better to combine attacking and defensive contributions in a single model by treating goals scored (and/or shots at goal) as the outcome of own-team attacking play and opposition defensive play.

 

The second issue with the win-contribution approach is that the search for a better statistical model to avoid the various estimation problems may yield estimated contributions for the different skill-activities which may not be generalizable beyond the specific sample used and the specific model estimated. The estimated weightings derived from regression models can be very unstable and sensitive to what other skill-activities are included. This instability problem occurs when there is a high degree of correlation between some skill-activities (i.e. multicollinearity). The generalizability of the estimated weightings will be improved by using larger samples that include multiple seasons and multiple leagues.

 

The final issue with win-contribution models estimated using statistical methods such as regression analysis is that the weightings reflect statistical predictive power. But is the value of a skill-activity as a statistical predictor of match outcomes the appropriate definition of the value of the win-contribution of that skill-activity? I do not think that we give enough explicit attention to this issue. Too often we only consider it indirectly when, for example, we try to resolve the problem of certain skill-activities having excessive weightings because of the sequential nature of game processes. Actions near the end of a sequence tend naturally to have much greater predictive power for the final outcome. Typically shots at goal is the best single predictor of goals scored while the goalkeeper’s save-shot ratio is the best single predictor of goals conceded. Using multilevel models is, when all is said and done, just an attempt to reduce the predictive power of these close-to-outcome skill-activities. The issue is of particular importance in low-scoring, more unpredictable team sports such as (association) football.

 

All of these issues with win-contribution models raise severe doubts about the usefulness of relying on a purely statistical approach such a linear regression both to identify the relevant skill-activities to be included in the player rating system, and to determine the appropriate weighting system to combine the selected skill-activities. As a result some player rating systems have tended to adopt a more “mixed-methods” approach combining statistical analysis and expert judgment. One example of this approach is my own STARS player rating system that I developed around 12 years ago, initially applied to the English Premiership and then subsequently recalibrated for the MLS. The STARS player (and team) ratings were central to the work I did for Billy Beane and the Oakland A’s ownership group on investigating the scope for data analytics in football. The STARS player rating system is summarised in the graphic below.

Blog 12 Graphic.png

Regression analysis was used to estimate a multilevel model which provided the basic weightings for the skill-activities within the five identified groupings. Expert judgment was used to decide which skill-activities to include, the functional form for the metrics, and the weightings used to combine the five groupings. Essentially this weighting scheme was based on a 4-4-2 formation with attack and defence groupings each weighted as 4/11, striking as 2/11, and goalkeeping as 1/11. (Negative contributions were reassigned to the attack and defence groupings.) Expert judgment was also used to determine the weightings of some skill-activities for which regression analysis proved unable to provide reliable estimates.

 

A very detailed account of the problems of constructing a win-contribution player rating system in football using regression analysis is provided by Ian McHale who developed the EA Sports Player Performance Index (formerly the Actim Index) for the English Premiership and Championship (see I. G. McHale, P. A. Scarf and D. E. Folker, Interfaces, July-August 2012). McHale’s experience is also discussed in David Sumpter’s recently published book, Soccermatics: Mathematical Adventures in the Beautiful Game (Bloomsbury Sigma, 2016), a must-read for all of us with an interest in applying mathematics and statistics to football. At the core of the EA Sports player rating system is a match-contribution model in which regression analysis is used to estimate a model of shots as a function of crosses, dribbles and passes as well as opposition defensive actions (interceptions, clearances and tackle-win ratio) and opposition discipline (yellow cards and red cards). The estimated model of shots is combined with shot effectiveness and then rescaled in terms of league points. In their 2012 article McHale and his co-authors report the top 20 Premiership players for season 2008/09 based on the match-contribution model and show that the list is dominated by goalkeepers (7) and defenders (11) with Fulham’s goalkeeper, Mark Schwarzer, topping the list. Only the Aston Villa midfielder, Gareth Barry (ranked 2nd), and the Chelsea striker, Nicolas Anelka (ranked 10th), break the goalkeeper-defender domination of the top ratings.

 

McHale deals with the problems of a purely statistical approach by adopting what I am calling a mixed-methods approach combining statistical analysis and expert judgement. The final version of the EA Sports Player Performance Index consists of a weighted combination of six separate indices. The match-contribution model has a weighting of only 25%. There are two indices based on minutes played which have a combined weighting of 50% with most of that weighting (37.5%) allocated to the point-sharing index which takes into account the final league points of the player’s team thereby increasing the rating of players playing for more successful teams. The other indices capture goal-scoring, assists and clean sheets and have a combined weighting of 25%. All the indices are measured in terms of league points. For comparison McHale reports the top 20 Premiership players for 2008/09 using the final index and finds that the list is now much more evenly distributed across playing distributions with Anelka now topping the list and Schwarzer ranked only 17th.

 

McHale’s mixed-methods approach is a great example of the problems faced by win-contribution player rating systems and how statistical analysis and expert judgment need to be combined to produce a credible player rating system. Credibility is absolutely fundamental to data analytics. Decision makers will ignore evidence that does not appear credible and the use of sophisticated statistical techniques does not confer credibility to the analysis, often quite the opposite. McHale recognises that a purely statistical approach using predictive power to weight different skill-activities does not provide credible player ratings and, in consultation with his clients, introduces other performance metrics using expert judgment not statistical estimation.

 

I have one further concern over the credibility of player rating systems and that is the importance of transparency when the player ratings are to be used as an input for coaching, recruitment and remuneration decisions. This is not really an issue for McHale since the EA Sports Player Performance Index is primarily directed at the media and fans (although interestingly McHale shows that there was a very close match-up between a hypothetical England team based on the player ratings and England’s starting line-up in their first game in the 2010 World Cup Finals). McHale achieves credibility through a rigorous development process that produces ratings that “look right” to his clients and to the knowledgeable fan. But such a system of player ratings would have limited value for coaches because of the lack of immediate transparency. For example, it is not immediately clear how much of the difference between the ratings of two players is due to differences in their players’ own contributions and how much is due to differences between the league performances of their respective teams. Credibility for decision makers is not just about results that “look right”. At times the data analyst will throw up surprising results which “look wrong” by defying conventional wisdom but such surprises, if they can be substantiated, may provide a real source of competitive advantage. In such cases the rigour of the analysis is unlikely to be enough. The results will need to be transparent in the sense of being explicable to the decision maker in practical terms. A credible player ratings system for use by GMs, sporting directors and coaches requires that differences in player ratings are explicable simply but precisely as specific differences in player performance. My next post will set out a simple approach to constructing player rating systems to support coaching, recruitment and remuneration decisions.