The Problems of Estimating Win-Contributions in Football (Soccer)

Executive Summary

  • The practical problems of obtaining regression-based estimates of win-contributions are illustrated using data for the English Championship regular season in 2015/16.
  • The estimated regression models for both attacking and defensive play are subject to various problems – low explanatory power, “wrong” signs, statistical insignificance, and sequence effects.
  • But the ultimate problem with regression-based approaches to player rating systems is that they reflect statistical predictive power of individual skill-activities and this may not coincide with game importance from an expert coaching perspective.
  • My conclusion is that a regression-based approach to player rating systems in the invasion-territorial team sports is not recommended.

 

Developing a player rating system in the invasion-territorial team sports using win-contributions at least in principal seems a straightforward procedure involving two stages. The first stage is to estimate the team-level relationship between skill-activities and match outcomes in order to get the weightings to be applied to each type of contribution. The most obvious statistical procedure to use is multiple regression analysis. The second stage is to calculate the overall win-contributions of individual players as a linear combination of their skill-activity contributions using the weightings estimated in the first stage. But although seemingly a straightforward multivariate problem statistically, this approach is fraught with practical difficulties. Indeed I will argue that it is often so difficult to obtain an appropriate set of weightings that a regression-based approach to estimating player win-contributions is just not viable.

 

To demonstrate the difficulty of a regression-based player rating system in the invasion-territorial team sports, I am going to use football (soccer) and specifically data from the English Championship last season (2015/16). In the table below I have reported the results for four regression models estimated using Opta data for the 552 regular-season matches (i.e. 1,104 team performances). These four estimated regression models illustrate many of the problems that bedevil regression models of team performance in football.

 

The first issue is to decide on the appropriate measure of team performance. Using league points for individual matches would imply an outcome variable with only three possible values (win = 3, draw = 1, loss = 0) which is highly restrictive and not really amenable to linear regression. It would be more appropriate to use a form of limited dependent variable (LDV) estimation technique such as logistic regression. To avoid this problem I typically use goals scored and goals conceded as measures of attacking and defensive performance, respectively, estimating two separate regression model which can be combined subsequently. Given the low-scoring nature of football and the Poisson distribution of goals, linear regression remains a rather crude statistical tool but has the advantages of simplicity and ease of interpretation.

 

Model Attack (1) Attack (2) Attack (3) Defence
Outcome Goals Scored Goals Scored Total Shots Goals Conceded
Total Shots 0.0805894   (0.006886)** 0.0698052

(0.006165)**

Shot Accuracy 3.08113     (0.1916)** 3.43328

(0.1940)**

Attempted Passes -0.000960860 (0.0005349) 0.000752025

(0.002346)

0.000554711 (0.0006363)
Pass Completion 0.842777   (0.6249) 10.1551

(2.729)**

-0.146641     (0.6778)
Dribbles 0.00485571 (0.005885) 0.0466076

(0.02582)

Dribble Success Rate 0.141340 (0.1925) 0.334019 (0.8458)
Open Play Crosses     -0.0277967   (0.005021)** 0.213755   (0.02090)**
Open Play Cross Success Rate     0.251270     (0.2396) 6.75974     (1.032)**
Attacking Duels     -0.0109001   (0.002935)** 0.0237501   (0.01283)
Attacking Duel Success Rate     0.365114     (0.3961) 6.69654     (1.727)**
Yellow Cards       -0.0745656   (0.02191)** -0.226067   (0.09603)* 0.00894283   (0.02583)
Red Cards           -0.403386     (0.1045)** -0.755729     (0.4587) 0.376279     (0.1230)**
Total Clearances   -0.0126510   (0.003719)**
Blocks -0.0103887   (0.01644)
Interceptions -0.0136031   (0.006029)*
Defensive Duels   -0.00894992   (0.003265)**
Defensive Duel Success Rate     1.69258     (0.4240)**
Goodness of Fit

R2

32.74% 26.94% 26.20% 6.26%

* = significant at 5% level; ** = significant at 1% level

 

The Attack (1) model uses goals scored as the outcome variable with five skill-activities – shots, passes, dribbles, crosses and attacking duels – plus two disciplinary metrics (yellow cards and red cards). The five skill-activities are each measured by two metrics – an activity-level metric (i.e. number of attempts) and an effectiveness-ratio metric (i.e. proportion of successful outcomes). So, for example, in the case of shots the activity-level metric is total shots and the effectiveness-ratio metric is shot accuracy (i.e. the proportion of shots on target).

 

The Attack (1) model exemplifies a number of the problems in using regression analysis to derive a set of weightings for player rating systems:

  • Low goodness of fit – the R2 statistic is only 32.7% indicating that less than a third of the variation in goals scored can be explained by the five skill-activities and discipline
  • “Wrong” signs – the estimated coefficients for attempted passes, open play crosses and attacking duels are all negative
  • Statistical insignificance – half of the estimated coefficients are not statistically different from zero
  • Sequence effects – most of the goodness of fit in the Attack (1) model is due to the two end-of-sequence metrics, total shots and shot accuracy. As the Attack (2) model shows, total shots and shot accuracy jointly account for 26.9% of the variation in goals scored.

Similar problems of wrong signs and statistical insignificance occur in the Defence model which only captures 6.3% of the variation in goals conceded across matches in part because no goalkeeping metrics have been included. But of course if goalkeeping metrics such as the saves-to-shots ratio are included, these dominate in much the same way as shooting metrics dominate estimated regression models of goals scored.

 

One solution to the problem that regression models will tend to attribute the highest weight to the end-of-sequence variables is to break the causal sequence into components to be estimated separately. The Attack (2) and Attack (3) models are an example of this approach with the Attack (2) model estimating the relationship between goals scored (final outcome) and shots (total shots and shot accuracy), and then the Attack (3) model estimating the relationship between total shots (intermediate outcome) and passes, dribbles, crosses, attacking duels and discipline. This approach resolves some of the problems encountered in the Attack (1) model. Although goodness of fit remains low with only 26.2% of the variation in total shots across matches explained by the Attack (3) model, all of the variables now have the expected signs so that attempted passes, open play crosses and attacking duels now have positive coefficients. In addition pass completion, open play cross success rate and attacking duel success rate are now statistically significant. But attempted passes, although now attributed a positive contribution, has a very small and statistically insignificant coefficient which reflects the underlying playing characteristic of the English Championship that ball possession has little predictive power for goals scored and match outcomes. And this remains the core problem with regression-based estimates of the weightings to be used in win-contributions player rating systems. Regression-based weightings reflect statistical predictive power not game importance. Ultimately I have been driven to the conclusion that regression-based player rating systems are not to be recommended for the invasion-territorial team sports. An alternative approach is the subject of my next post.

The Practical Problems of Constructing Win-Contribution Player Rating Systems in the Invasion-Territorial Sports

Executive Summary

  • Effective data-based assessment of individual player performance in team sports must resolve the three basic conceptual problems of separability, multiplicity and measurability. These problems are most acute in the invasion-territorial sports.
  • In statistical terms, the win-contribution approach to player rating systems can be seen as a multivariate problem of identifying and combining a set of skill-activity performance metrics to model team performance.
  • Regression analysis is the simplest statistical method for estimating the skill-activity weightings to be used in a win-contribution player ratings system with multiple skill-activities.
  • There are three practical problems widely encountered when using the regression method: (i) defining an appropriate measure of team performance; (ii) the skill-activity coefficients often have the wrong sign and/or are statistically insignificant; and (iii) the weightings reflect relative predictive power which may not necessarily coincide with the relative game importance of the specific skill-activity.

 

Evaluating individual player performance in team sports using a systematic data-based approach faces three basic conceptual problems:

 

  1. Separability – team performance needs to be decomposed into individual player performances but the degree of separability of individual player performances depends crucially on the basic game structure of the sport. Separability is highest in the striking-and-fielding sports such as baseball and cricket in which the core of the game is a one-to-one contest between the batter and pitcher/bowler. In the invasion-territorial sports such as the various codes of football, hockey and basketball the interdependency of player actions and the necessity for tactical coordination of players makes separability much more problematic.
  2. Multiplicity – if the game structure is such that individual players specialise in one specific skill-activity which is the dominant component of their performance (e.g. pitching and hitting in baseball with fielding treated as of only secondary importance) then evaluating player performance comes down to identifying the best metric to measure the specific skill-activity performance. However, particularly in many of the invasion-territorial sports, players undertake a multiplicity of skill-activities so that the evaluation of player performance requires finding the appropriate combination of a set of performance metrics.
  3.  Measurability – by definition, data-based player rating systems focus only on those aspects of player performance that are directly observable and measurable. To some this isn’t an issue and they will justify their position with the well-known dictum: “If you can’t measure it, you can’t manage it”. But this just isn’t true. Coaching and managing is about knowing the people for whom you are responsible and how they are performing, and learning how best to facilitate improvements in their performance. You are likely to be less effective as a coach and manager if you ignore available data on performance but likewise you will also be less effective if you focus only on the measurable aspects of performance. As always it is about using all the available evidence as best you can to improve performance. Motivation and resilience may not be directly observable and easily measurable but I doubt that there are many coaches who would argue that they are not important aspects of player performance.

As I have discussed in my previous post, there are two broad approaches to constructing player rating systems – the win-attribution approach and the win-contribution approach. The win-attribution approach, principally plus-minus scores, effectively finesses all three conceptual problems – separability, multiplicity and measurability – by focusing on outcome not process, and attributing the match score pro rata based on players’ game time. By contrast, the win-contribution approach focuses on the process of how the team performance is generated by individual player performance. And as a consequence, the win-contribution approach has to deal with the separability, multiplicity and measurability problems. Ultimately it comes down to:

  • Identifying the appropriate set of specific skill-activity performance metrics; and
  • Determining the best way of combining this set of performance metrics particularly the weightings to be used to produce an overall composite index of player performance

 

From a statistical perspective the win-contribution approach to player rating systems is just a standard multivariate problem of determining the relationship between team performance (the outcome) and the aggregate contributions of players by skill-activity (the predictors). The simplest approach is to estimate a linear regression model of team performance:

Team Performance = a + b1P1 + b2P2 + … + bkPk + u

where

P1, P2, …, Pk = skill-activity metrics (team totals)

b1, b2, …, bk = skill-activity weightings

a = intercept

u = random error term capturing non-systematic influences on team performance

The estimated regression coefficients can then be used to combine the skill-activity metrics for individual players to produce an overall measure of player performance.

 

In principle regression analysis offers a very straightforward method of creating a win-contribution player rating system for the invasion-territorial sports. However there are a number of practical problems in implementing the method to produce meaningful and useful player ratings.

 

Practical Problem 1: Defining an appropriate measure of team performance

This is not a straightforward as it might seem. If the regression model of team performance is to be estimated using season totals in a league then total league points or win-percentage are the obvious outcome measures to use but it is highly likely that data for several seasons will need to be combined in order to have enough degrees of freedom if you intend to use a large number of skill-activity metrics. The alternative approach is to use individual match data. In this case using a measure of match outcome is too restrictive. You run into all of the usual problems associated with limited dependent variable (LDV) models and are better advised to use logistic regression (or related approaches) rather than linear regression. If you want to keep using linear regression with individual match data, it is better to model team performance using scores, either a single model of the final margin or two separate models of scores for and scores against. In my work on player ratings in rugby union and rugby league, I have used individual match data and estimated two separate models for points scored and points conceded, and then combined these two models to create a model of the final margin. I found that this worked better than just estimating a single model for the final margin and seemed better able to identify the impact of different skill-activity metrics. Of course any score-based approach is more problematic in (association) football because it is such a low-scoring sport. I still tend to use goals scored and goals conceded as my outcome measures but I have also used own and opposition shots on target as outcome measures.

 

Practical Problem 2: The estimated regression coefficients may have the “wrong” sign and/or be statistically insignificant

When regression models of team performance are estimated it is more likely than not that several of the skill-activity metrics have coefficients will have the “wrong” sign and/or are not statistically significantly different from zero. There are two common reasons for wrong signs and/or statistical insignificance. First, skill-activity metrics usually suffer from a multicollinearity problem where individual variables are highly correlated with each other either directly (i.e. simple bivariate correlations) or in linear combinations. For example, teams which defend more and make more tackles also tend to make more interceptions, clearances and blocks. High levels of multicollinearity can make estimated coefficients unstable including being more prone to switching sign, as well as being more imprecise (i.e. higher standard errors) and hence more likely to be statistically insignificant. Another reason for wrong signs is that some activity-skill variables may be acting as a proxy for opposition skill-activities. For example, more defending partly reflects more attacking play by the opposition, and the more the opposition attacks, the more goals are likely to be conceded. As a consequence, defensive variables may be positively correlated with goals conceded even although more (and better) defending should be negatively correlated with goals conceded.

 

Practical Problem 3: Regression coefficients define the relative importance of contributions purely in terms of predictive power

Ultimately regression analysis is a technique for finding the linear combination of a set of variables that can provide the best predictions of the outcome variable. So the estimated coefficients are indicative of the relative predictive power of each variable. However predictive power does not necessarily equate to the relative game importance of contributions when you are dealing with processes comprising a sequence of different skill-activities. For example, in football the best predictor of goals scored is shots on target inside the box and so inevitably in any linear regression model of goals scored, the number of shots on target (especially inside the box) will have the highest weighting. But of course shots depend on passing and moving the ball forward successfully to create shooting opportunities, all of which in turn depends on winning possession of the ball in the first place. But all of these skill-activities provide much less predictive power for goals scored because they are further back the causal chain. Similarly when it comes to goals conceded the dominant predictor is the goalkeeper’s saves per shot ratio but the number of opposition shots allowed depends on defensive play such as tackles, interceptions, clearances and blocks. Defensive play is critical as a contribution to match success but statistically will always to be treated as of only secondary importance as a predictor of match outcomes. One way around this within the linear regression method is to estimate hierarchical models to capture the sequential nature of the game.

 

Despite the practical problems, it may still be possible to use the regression method to produce a meaningful and useful player rating system. After estimating the initial regression model of team performance using the basic skill-activity metrics, it is vital to undertake a specification search to find a model with better properties, specifically statistically significant coefficients with the “correct” signs as well as good diagnostics (i.e. random residual variation). The specification search may involve the use of different functional forms such as logarithms and quadratics. It can also involve the transformation of the basic skill-activity metrics. For example, suppose you have data on the total number of successful passes and the total number of unsuccessful passes. Instead of using the data in this form, it might be better to transform the two variables into a total activity measure (i.e. the total number of attempted passes = successful passes + unsuccessful passes) and a success rate (successful passes as a % of attempted passes). A more radical solution would be to use factor analysis to reconstruct the original set of metrics into a smaller set of factors based on the collinearity between the initial variables.

 

The best way forward, as always with all practical problems, is to investigate alternatives to find out what works best in a specific context. So, in that spirit, my next post will be an exploration of using alternative regression-based player rating systems to identify the “best” outfield players in the Football League Championship last season.

 

29th September 2016