The Practical Problems of Constructing Win-Contribution Player Rating Systems in the Invasion-Territorial Sports

Originally Written: October 2016

Executive Summary

Effective data-based assessment of individual player performance in team sports must resolve the three basic conceptual problems of separability, multiplicity and measurability. These problems are most acute in the invasion-territorial sports.
In statistical terms, the win-contribution approach to player rating systems can be seen as a multivariate problem of identifying and combining a set of skill-activity performance metrics to model team performance.
Regression analysis is the simplest statistical method for estimating the skill-activity weightings to be used in a win-contribution player ratings system with multiple skill-activities.
There are three practical problems widely encountered when using the regression method: (i) defining an appropriate measure of team performance; (ii) the skill-activity coefficients often have the wrong sign and/or are statistically insignificant; and (iii) the weightings reflect relative predictive power which may not necessarily coincide with the relative game importance of the specific skill-activity.

Evaluating individual player performance in team sports using a systematic data-based approach faces three basic conceptual problems:

Separability – team performance needs to be decomposed into individual player performances but the degree of separability of individual player performances depends crucially on the basic game structure of the sport. Separability is highest in the striking-and-fielding sports such as baseball and cricket in which the core of the game is a one-to-one contest between the batter and pitcher/bowler. In the invasion-territorial sports such as the various codes of football, hockey and basketball the interdependency of player actions and the necessity for tactical coordination of players makes separability much more problematic.
Multiplicity – if the game structure is such that individual players specialise in one specific skill-activity which is the dominant component of their performance (e.g. pitching and hitting in baseball with fielding treated as of only secondary importance) then evaluating player performance comes down to identifying the best metric to measure the specific skill-activity performance. However, particularly in many of the invasion-territorial sports, players undertake a multiplicity of skill-activities so that the evaluation of player performance requires finding the appropriate combination of a set of performance metrics.
Measurability – by definition, data-based player rating systems focus only on those aspects of player performance that are directly observable and measurable. To some this isn’t an issue and they will justify their position with the well-known dictum: “If you can’t measure it, you can’t manage it”. But this just isn’t true. Coaching and managing is about knowing the people for whom you are responsible and how they are performing, and learning how best to facilitate improvements in their performance. You are likely to be less effective as a coach and manager if you ignore available data on performance but likewise you will also be less effective if you focus only on the measurable aspects of performance. As always it is about using all the available evidence as best you can to improve performance. Motivation and resilience may not be directly observable and easily measurable but I doubt that there are many coaches who would argue that they are not important aspects of player performance.

As I have discussed in my previous post, there are two broad approaches to constructing player rating systems – the win-attribution approach and the win-contribution approach. The win-attribution approach, principally plus-minus scores, effectively finesses all three conceptual problems – separability, multiplicity and measurability – by focusing on outcome not process, and attributing the match score pro rata based on players’ game time. By contrast, the win-contribution approach focuses on the process of how the team performance is generated by individual player performance. And as a consequence, the win-contribution approach has to deal with the separability, multiplicity and measurability problems. Ultimately it comes down to:

Identifying the appropriate set of specific skill-activity performance metrics; and
Determining the best way of combining this set of performance metrics particularly the weightings to be used to produce an overall composite index of player performance

From a statistical perspective the win-contribution approach to player rating systems is just a standard multivariate problem of determining the relationship between team performance (the outcome) and the aggregate contributions of players by skill-activity (the predictors). The simplest approach is to estimate a linear regression model of team performance:

Team Performance = a + b₁P₁ + b₂P₂ + … + b_kP_k + u

where

P₁, P₂, …, P_k = skill-activity metrics (team totals)

b₁, b₂, …, b_k = skill-activity weightings

a = intercept

u = random error term capturing non-systematic influences on team performance

The estimated regression coefficients can then be used to combine the skill-activity metrics for individual players to produce an overall measure of player performance.

In principle regression analysis offers a very straightforward method of creating a win-contribution player rating system for the invasion-territorial sports. However there are a number of practical problems in implementing the method to produce meaningful and useful player ratings.

Practical Problem 1: Defining an appropriate measure of team performance

This is not a straightforward as it might seem. If the regression model of team performance is to be estimated using season totals in a league then total league points or win-percentage are the obvious outcome measures to use but it is highly likely that data for several seasons will need to be combined in order to have enough degrees of freedom if you intend to use a large number of skill-activity metrics. The alternative approach is to use individual match data. In this case using a measure of match outcome is too restrictive. You run into all of the usual problems associated with limited dependent variable (LDV) models and are better advised to use logistic regression (or related approaches) rather than linear regression. If you want to keep using linear regression with individual match data, it is better to model team performance using scores, either a single model of the final margin or two separate models of scores for and scores against. In my work on player ratings in rugby union and rugby league, I have used individual match data and estimated two separate models for points scored and points conceded, and then combined these two models to create a model of the final margin. I found that this worked better than just estimating a single model for the final margin and seemed better able to identify the impact of different skill-activity metrics. Of course any score-based approach is more problematic in (association) football because it is such a low-scoring sport. I still tend to use goals scored and goals conceded as my outcome measures but I have also used own and opposition shots on target as outcome measures.

Practical Problem 2: The estimated regression coefficients may have the “wrong” sign and/or be statistically insignificant

When regression models of team performance are estimated it is more likely than not that several of the skill-activity metrics have coefficients will have the “wrong” sign and/or are not statistically significantly different from zero. There are two common reasons for wrong signs and/or statistical insignificance. First, skill-activity metrics usually suffer from a multicollinearity problem where individual variables are highly correlated with each other either directly (i.e. simple bivariate correlations) or in linear combinations. For example, teams which defend more and make more tackles also tend to make more interceptions, clearances and blocks. High levels of multicollinearity can make estimated coefficients unstable including being more prone to switching sign, as well as being more imprecise (i.e. higher standard errors) and hence more likely to be statistically insignificant. Another reason for wrong signs is that some activity-skill variables may be acting as a proxy for opposition skill-activities. For example, more defending partly reflects more attacking play by the opposition, and the more the opposition attacks, the more goals are likely to be conceded. As a consequence, defensive variables may be positively correlated with goals conceded even although more (and better) defending should be negatively correlated with goals conceded.

Practical Problem 3: Regression coefficients define the relative importance of contributions purely in terms of predictive power

Ultimately regression analysis is a technique for finding the linear combination of a set of variables that can provide the best predictions of the outcome variable. So the estimated coefficients are indicative of the relative predictive power of each variable. However predictive power does not necessarily equate to the relative game importance of contributions when you are dealing with processes comprising a sequence of different skill-activities. For example, in football the best predictor of goals scored is shots on target inside the box and so inevitably in any linear regression model of goals scored, the number of shots on target (especially inside the box) will have the highest weighting. But of course shots depend on passing and moving the ball forward successfully to create shooting opportunities, all of which in turn depends on winning possession of the ball in the first place. But all of these skill-activities provide much less predictive power for goals scored because they are further back the causal chain. Similarly when it comes to goals conceded the dominant predictor is the goalkeeper’s saves per shot ratio but the number of opposition shots allowed depends on defensive play such as tackles, interceptions, clearances and blocks. Defensive play is critical as a contribution to match success but statistically will always to be treated as of only secondary importance as a predictor of match outcomes. One way around this within the linear regression method is to estimate hierarchical models to capture the sequential nature of the game.

Despite the practical problems, it may still be possible to use the regression method to produce a meaningful and useful player rating system. After estimating the initial regression model of team performance using the basic skill-activity metrics, it is vital to undertake a specification search to find a model with better properties, specifically statistically significant coefficients with the “correct” signs as well as good diagnostics (i.e. random residual variation). The specification search may involve the use of different functional forms such as logarithms and quadratics. It can also involve the transformation of the basic skill-activity metrics. For example, suppose you have data on the total number of successful passes and the total number of unsuccessful passes. Instead of using the data in this form, it might be better to transform the two variables into a total activity measure (i.e. the total number of attempted passes = successful passes + unsuccessful passes) and a success rate (successful passes as a % of attempted passes). A more radical solution would be to use factor analysis to reconstruct the original set of metrics into a smaller set of factors based on the collinearity between the initial variables.

The best way forward, as always with all practical problems, is to investigate alternatives to find out what works best in a specific context. So, in that spirit, my next post will be an exploration of using alternative regression-based player rating systems to identify the “best” outfield players in the Football League Championship last season.

Originally Written: October 2016

Share this: