More on the Problems of Win-Contribution Player Rating Systems and a Possible Mixed-Methods Solution

Originally Written: November 2016

Executive Summary

  • There are three main problems with the win-contribution approach to player ratings: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings; and (iii) measuring contribution importance as statistical predictive power.
  • A possible solution to these problems is to adopt a mixed-methods approach combining statistical analysis and expert judgment.
  • The EA Sports Player Performance Index and my own STARS player rating system are both examples of the mixed-methods approach.
  • Decision makers require credible data analytics but credibility does not depend solely on producing results that look right. Some of the most important results look wrong by defying conventional wisdom.
  • A credible player ratings system for use by decision makers within teams requires that differences in player ratings are explicable simply but precisely as specific differences in player performance.

In my previous post I discussed some of the problems of adopting a win-contribution approach to player ratings in the invasion-territorial team sports. Broadly speaking, there are three main issues: (i) statistical estimation problems; (ii) the sample-specific and model-specific nature of the weightings used to combine the different skill-activities into a single player rating; and (iii) measuring contribution importance as statistical predictive power. The first issue arises because win-contribution models in the invasion-territorial team sports are essentially multivariate models to be estimated using, for example, linear regression methods. Estimation problems abound with these types of models as exemplified by the regression results reported in my previous post for the Football League Championship 2015/16 which included “wrong” signs, statistically insignificant estimates, excessive weightings for actions mostly closely connected with goals scored/conceded (i.e. shots and saves), and low goodness of fit. These problems can often be resolved by restructuring the win-contribution model. In particular, a multilevel model can take account of the sequential nature of different contributions. Also it often works better to combine attacking and defensive contributions in a single model by treating goals scored (and/or shots at goal) as the outcome of own-team attacking play and opposition defensive play.

The second issue with the win-contribution approach is that the search for a better statistical model to avoid the various estimation problems may yield estimated contributions for the different skill-activities which may not be generalizable beyond the specific sample used and the specific model estimated. The estimated weightings derived from regression models can be very unstable and sensitive to what other skill-activities are included. This instability problem occurs when there is a high degree of correlation between some skill-activities (i.e. multicollinearity). The generalizability of the estimated weightings will be improved by using larger samples that include multiple seasons and multiple leagues.

The final issue with win-contribution models estimated using statistical methods such as regression analysis is that the weightings reflect statistical predictive power. But is the value of a skill-activity as a statistical predictor of match outcomes the appropriate definition of the value of the win-contribution of that skill-activity? I do not think that we give enough explicit attention to this issue. Too often we only consider it indirectly when, for example, we try to resolve the problem of certain skill-activities having excessive weightings because of the sequential nature of game processes. Actions near the end of a sequence tend naturally to have much greater predictive power for the final outcome. Typically shots at goal is the best single predictor of goals scored while the goalkeeper’s save-shot ratio is the best single predictor of goals conceded. Using multilevel models is, when all is said and done, just an attempt to reduce the predictive power of these close-to-outcome skill-activities. The issue is of particular importance in low-scoring, more unpredictable team sports such as (association) football.

All of these issues with win-contribution models raise severe doubts about the usefulness of relying on a purely statistical approach such a linear regression both to identify the relevant skill-activities to be included in the player rating system, and to determine the appropriate weighting system to combine the selected skill-activities. As a result some player rating systems have tended to adopt a more “mixed-methods” approach combining statistical analysis and expert judgment. One example of this approach is my own STARS player rating system that I developed around 12 years ago, initially applied to the English Premiership and then subsequently recalibrated for the MLS. The STARS player (and team) ratings were central to the work I did for Billy Beane and the Oakland A’s ownership group on investigating the scope for data analytics in football. The STARS player rating system is summarised in the graphic below.

Blog 12 Graphic.png

Regression analysis was used to estimate a multilevel model which provided the basic weightings for the skill-activities within the five identified groupings. Expert judgment was used to decide which skill-activities to include, the functional form for the metrics, and the weightings used to combine the five groupings. Essentially this weighting scheme was based on a 4-4-2 formation with attack and defence groupings each weighted as 4/11, striking as 2/11, and goalkeeping as 1/11. (Negative contributions were reassigned to the attack and defence groupings.) Expert judgment was also used to determine the weightings of some skill-activities for which regression analysis proved unable to provide reliable estimates.

A very detailed account of the problems of constructing a win-contribution player rating system in football using regression analysis is provided by Ian McHale who developed the EA Sports Player Performance Index (formerly the Actim Index) for the English Premiership and Championship (see I. G. McHale, P. A. Scarf and D. E. Folker, Interfaces, July-August 2012). McHale’s experience is also discussed in David Sumpter’s recently published book, Soccermatics: Mathematical Adventures in the Beautiful Game (Bloomsbury Sigma, 2016), a must-read for all of us with an interest in applying mathematics and statistics to football. At the core of the EA Sports player rating system is a match-contribution model in which regression analysis is used to estimate a model of shots as a function of crosses, dribbles and passes as well as opposition defensive actions (interceptions, clearances and tackle-win ratio) and opposition discipline (yellow cards and red cards). The estimated model of shots is combined with shot effectiveness and then rescaled in terms of league points. In their 2012 article McHale and his co-authors report the top 20 Premiership players for season 2008/09 based on the match-contribution model and show that the list is dominated by goalkeepers (7) and defenders (11) with Fulham’s goalkeeper, Mark Schwarzer, topping the list. Only the Aston Villa midfielder, Gareth Barry (ranked 2nd), and the Chelsea striker, Nicolas Anelka (ranked 10th), break the goalkeeper-defender domination of the top ratings.

McHale deals with the problems of a purely statistical approach by adopting what I am calling a mixed-methods approach combining statistical analysis and expert judgement. The final version of the EA Sports Player Performance Index consists of a weighted combination of six separate indices. The match-contribution model has a weighting of only 25%. There are two indices based on minutes played which have a combined weighting of 50% with most of that weighting (37.5%) allocated to the point-sharing index which takes into account the final league points of the player’s team thereby increasing the rating of players playing for more successful teams. The other indices capture goal-scoring, assists and clean sheets and have a combined weighting of 25%. All the indices are measured in terms of league points. For comparison McHale reports the top 20 Premiership players for 2008/09 using the final index and finds that the list is now much more evenly distributed across playing distributions with Anelka now topping the list and Schwarzer ranked only 17th.

McHale’s mixed-methods approach is a great example of the problems faced by win-contribution player rating systems and how statistical analysis and expert judgment need to be combined to produce a credible player rating system. Credibility is absolutely fundamental to data analytics. Decision makers will ignore evidence that does not appear credible and the use of sophisticated statistical techniques does not confer credibility to the analysis, often quite the opposite. McHale recognises that a purely statistical approach using predictive power to weight different skill-activities does not provide credible player ratings and, in consultation with his clients, introduces other performance metrics using expert judgment not statistical estimation.

I have one further concern over the credibility of player rating systems and that is the importance of transparency when the player ratings are to be used as an input for coaching, recruitment and remuneration decisions. This is not really an issue for McHale since the EA Sports Player Performance Index is primarily directed at the media and fans (although interestingly McHale shows that there was a very close match-up between a hypothetical England team based on the player ratings and England’s starting line-up in their first game in the 2010 World Cup Finals). McHale achieves credibility through a rigorous development process that produces ratings that “look right” to his clients and to the knowledgeable fan. But such a system of player ratings would have limited value for coaches because of the lack of immediate transparency. For example, it is not immediately clear how much of the difference between the ratings of two players is due to differences in their players’ own contributions and how much is due to differences between the league performances of their respective teams. Credibility for decision makers is not just about results that “look right”. At times the data analyst will throw up surprising results which “look wrong” by defying conventional wisdom but such surprises, if they can be substantiated, may provide a real source of competitive advantage. In such cases the rigour of the analysis is unlikely to be enough. The results will need to be transparent in the sense of being explicable to the decision maker in practical terms. A credible player ratings system for use by GMs, sporting directors and coaches requires that differences in player ratings are explicable simply but precisely as specific differences in player performance. My next post will set out a simple approach to constructing player rating systems to support coaching, recruitment and remuneration decisions.