Diagnostic Testing Part 2: Spatial Diagnostics

Analytical models takes the following general form:

Outcome = f(Performance, Context) + Stochastic Error

The structural model represents the systematic (or “global”) variation in the process outcome associated with the variation in the performance and context variables. The stochastic error acts as a sort of “garbage can” to capture “local” context-specific influences on process outcomes that are not generalisable in any systematic way across all the observations in the dataset. All analytical models assume that the structural model is well specified and the stochastic error is random. Diagnostic testing is the process of checking that these two assumptions hold true for any estimated analytical model.

Diagnostic testing involves the analysis of the residuals of the estimated analytical model.

Residual = Actual Outcome – Predicted Outcome

Diagnostic testing is the search for patterns in the residuals. It is a matter of interpretation as to whether any patterns in the residuals are due to structural mis-specification problems or stochastic error mis-specification problems. But structural problems must take precedence since, unless the structural model is correctly specified, the residuals will be biased estimates of the stochastic error since they will be contaminated by structural mis-specification. In this post I am focusing on structural mis-specification problems associated with cross-sectional data in which the dataset comprises observations of similar entities at the same point in time. I label this type of residual analysis as “spatial diagnostics”. I will utilise all three principal  methods for detecting systematic variation in residuals: residual plots, diagnostic test statistics, and auxiliary regressions.

Data

The dataset being used to illustrate spatial diagnostics was originally extracted from the Family Expenditure Survey in January 1993. The dataset contains information on 608 households. Four variables are used – weekly household expenditure (EXPEND) is the outcome variable to be modelled by weekly household income INCOME), the number of adults in the household (ADULTS) and the age of the head of the household (AGE) which is defined as whoever is responsible for completing the survey. The model is estimated using linear regression.

Initial Model

The estimated linear model is reported in Table 1 below. On the face of it, the estimated model seems satisfactory, particularly for such a simple cross-sectional model, with around 53% of the variation in weekly expenditure being explained statistically by variation in weekly income, the number of adults in the household and the age of the head of household (R2 = 0.5327). All three impact coefficients are highly significant (P-value < 0.01). The t-statistic provides a useful indicator of the relative importance of the three predictor variables since it effectively standardises the impact coefficients using their standard errors as a proxy for the units of measurement. Not surprisingly, weekly household expenditure is principally driven by weekly household income with, on average, 59.6p spent out of every additional £1 of income.

Diagnostic Tests

However, despite the satisfactory goodness of fit and high statistical significance of the impact coefficients, the linear model is not fit for purpose in respect of its spatial diagnostics. Its residuals are far from random as can be seen clearly in the two residual plots in Figures 1 and 2. Figure 1 is the scatterplot of the residuals against the outcome variable, weekly expenditure. The ideal would be a completely random scatterplot with no pattern in either the average value of the residual which should be zero (i.e. no spatial correlation) or in the degree of dispersion (known as “homoskedasticity”). In other words, the scatterplot should be centred throughout on the horizontal axis and there should also be a relatively constant vertical spread of the residual around the horizontal axis. But the residuals for the linear model are clearly trended upwards in both value (i.e. spatial correlation) and dispersion (i.e. heteroskedasticity). In most cases in my experience this sort of pattern in the residuals is caused by wrongly treating the core relationship as linear when it is better modelled as a curvilinear or some other form of non-linear relationship.

            Figure 2 provides an alternative residual plot in which the residuals are ordered by their associated weekly expenditure. Effectively this plot replaces the absolute values of weekly expenditure with their rankings from lowest to highest. Again we should ideally get a random plot with no discernible pattern between adjacent residuals (i.e. no spatial correlation) and no discernible pattern in the degree of dispersion (i.e. homoskedasticity). Given the number of observations and the size of the graphic it is impossible to determine visually if there is any pattern between the adjacent residuals in most of the dataset except in the upper tail. But the degree of spatial correlation can be measured by applying the correlation coefficient to the relationship between ordered residuals and their immediate neighbour. Any correlation coefficient > |0.5| represents a large effect. In the case of the ordered residuals for the linear model of weekly household expenditure the spatial correlation coefficient is 0.605 which provides evidence of a strong relationship between adjacent ordered residuals i.e. the residuals are far from random.

            So what is causing the pattern in the residuals? One way to try to answer this question is to estimate what is called an “auxiliary regression” in which regression analysis is applied to model the residuals from the original estimated regression model. One widely used form of auxiliary regression is to use the squared residuals as the outcome variable to be modelled. The results for this type of auxiliary regression applied to the residuals from the linear model of weekly household regression are reported in Table 2. The auxiliary regression overall is statistically significant (F = 7.755, P-value = 0.000). The key result is that there is a highly significant relationship between the squared residuals and weekly household income, suggesting that the next step is to focus on reformulating the income effect on household expenditure.

Revised Model and Diagnostic Tests

So diagnostic testing has suggested the strong possibility that modelling the income effect on household expenditure as a linear effect is inappropriate. What is to be done? Do we need to abandon linear regression as the modelling technique? Fortunately the answer is “not necessarily”. Although there are a number of non-linear modelling techniques, it is in most cases possible to continue using linear regression by transforming the original variables. Instead of changing the estimation method, the alternative is to transform the original variables such that there is a linear relationship between the transformed variables that is amenable to estimation by linear regression. One commonly used transformation is to introduce the square of a predictor alongside the original predictor to capture a quadratic relationship. Another common transformation is to convert the model into a loglinear form by using logarithmic transformations of the original variables. It is the latter approach that I have used as a first step in attempting to improve the structural specification of the household expenditure model. Specifically, I have replaced the original expenditure and income variables, EXPEND and INCOME, with their natural log transformations, LnEXPEND and LnINCOME, respectively. The results of the regression analysis and diagnostic testing of the new loglinear model are reported below.

The estimated regression model is broadly similar in respect of its goodness of fit and statistical significance of the impact coefficients although, given the change in the functional form, these are not directly comparable. The impact coefficient on LnINCOME is 0.674 which represents what economists term “income elasticity” and implies that, on average, a 1% change in income is associated with a 0.67% change in expenditure in the same direction. The spatial diagnostics have improved although the residual scatterplot still shows evidence of a trend. The ordered residuals appear much more random than previously with the spatial correlation coefficient having been nearly halved and now evidence only of a medium-sized effect (> |0.3|) between adjacent residuals. The auxiliary regression is still significant overall (F = 6.204; P-value = 0.000) and, although the loglinear specification has produced a better fit for the income effect (with a lower t-statistic and increased P-value), it has had an adverse impact on the age effect (with a higher t-statistic and a P-value close to being significant at the 5% level). The conclusion – the regression model of weekly household expenditure remains “work in progress”. The next steps might be to consider extending the log transformation to the other predictors and/or introducing a quadratic age effect.

Other Related Posts

Diagnostic Testing Part 1: Why Is It So Important?

Competitive Balance Part 3: North American Major Leagues

As discussed in the two previous posts on competitive balance, there is no agreed single definition of competitive balance beyond a general statement that a competitively balanced league is characterised by all teams having a relatively equal chance of winning individual games and the league championship. The lack of agreement on a specific definition of competitive balance combined with the wide variety of league structures and the statistical problems of inferring ex ante (i.e. pre-event) success probabilities from ex post (i.e. actual) league outcomes has led to a multiplicity of competitive balance metrics. Morten Kringstad and I have argued in several published journal articles and book chapters that it is useful to categorise competitive balance metrics as either measures of win dispersion or performance persistence. Win dispersion measures the dispersion in league performance across teams in a particular season. Performance persistence measures the degree to which the league performance of individual teams is replicated across seasons – do teams tend to finish in the same league position in consecutive seasons? These are two quite different aspects of competitive balance and multiple metrics have been proposed for both. However, when it comes to discussions as to what leagues should do, if anything, to maintain or improve of competitive balance, there is a general (often implicit) presumption that all competitive balance metrics tend to move in the same direction. Morten and I have sought to discover if this is indeed the case. And, as reported in my previous post on the subject, the evidence from European football is quite mixed and, at the very least, casts doubt on the general presumption that there is a strong positive relationship between win dispersion and persistence. Indeed, we found that in the period 2008 – 2017 win dispersion and performance persistence tended to move in opposite directions in the English Premier League.

            In this post, I am going to discuss the evidence from a study on win dispersion and performance persistence in the four North American Major Leagues (NAMLs) that Morten and I published recently in Sport, Business, Management: An International Journal (vol 13 no. 5, 2023). Our dataset covered the four NAMLs – MLB (baseball), NFL (American football), NBA (basketball) and NHL (ice hockey) – seven different competitive balance metrics, and 60 seasons, 1960 – 2019 (thereby avoiding the impact of the Covid pandemic). In this post I am only focusing on the ASD* measure of win dispersion, the SRCC measure of performance persistence, and the correlation between these measures to test whether or not win dispersion and performance persistence move together in the same direction. I have reported these three measures as 10-year averages in order to identify possible trends over time. It is agreed that the ASD* metric provides better comparability of win dispersion between leagues with very different lengths of game schedules in the regular season. At one extreme the MLB has a 162-game schedule whereas for most of the period the NFL had a 16-game regular season schedule (recently increased to 17 games). The ASD* uses the actual standard deviation of team win percentages relative to the theoretical standard deviation of a perfectly dominated league with the same number of teams and games in which every team loses against the teams ranked above it so the top team wins every game, the second best team only loses against the top team, the 3rd-placed team only loses against the top two and so on. (Formally, this is called a “cascade” distribution.) The SRCC measure of performance persistence is just the Spearman rank correlation coefficient of league standings in two consecutive seasons.

            One important contextual change in most leagues since the 1960s has been the move away from a very restricted player labour market in which a player’s current team had priority in retaining a player. Instead player labour markets have become a very competitive auction-type market in which players have the right to move to another team at the end of their current contract (what is known as “free agency”). The NAML’s led the way in pro team sports in introducing some form of free agency in the 1970s/80s. European leagues lagged behind until the Bosman ruling in 1995 which effectively created free agency by abolishing transfer fees for out-of-contract players. So in some ways it should be expected that the general trend in the NAMLs has been towards greater competitive imbalance as the big-market teams have taken advantage of free agency to acquire the best players. However, there has been another general tendency with leagues becoming much more interventionist by introducing regulatory mechanisms especially salary caps which, in part, has been motivated by an attempt to offset the potential negative effect on competitive balance of free agency. Which effect has been stronger? Let’s look at the numbers.

            Table 1 below reports the 10-year averages for win dispersion for the four NAMLs. Broadly speaking, the pattern in win dispersion in the NAMLs over the last 60 years has been for win dispersion to decrease from the 1960s though to the 1990s (i.e. improved competitive balance) but for win dispersion to increase since the 1990s (i.e. reduced competitive balance). Both the MLB and NFL follow this pattern, suggesting that the league intervention effect may have initially dominated the free agency effect but in recent years the resource-richer teams may have adapted to the more regulated environment and found other ways to exert their financial advantage (while remaining compliant with league regulations) such as higher expenditures on technology and data analytics. I used to argue that the Oakland A’s and the Moneyball phenomenon is an example of data analytics being used as a “David” strategy for resource-poorer teams to compete more effectively. And it is true that in the early days of sports analytics it was often the resource-poorer teams that led the way in operationalising data analytics as a source of competitive advantage. But these days most teams recognise the potential gains from analytics and some very resource-rich teams are investing heavily in data analytics.

            The trends in win dispersion are much less clear in both the NBA and NHL. There has been some underlying trend from the 1960s onwards for competitive balance to worsen in the NBA as win dispersion has increased. In contrast, the NHL has tended to experience an improvement in competitive balance with lower win dispersion since the turn of the century.

            When win dispersion across the four NAMLs are compared, there is a rather surprising result that the NFL has the highest degree of win dispersion over the whole period (i.e. low competitive balance) whereas the MLB has the lowest win dispersion (i.e. high competitive balance) with the NBA and NHL in the mid-range. I say surprising since conventional wisdom is that NFL has been one of the most proactive leagues in trying to maintain a high level of competitive balance whereas traditionally the MLB has been much less interventionist. The problem in making comparisons across leagues especially in different sports is the “apples-and-oranges” problem – trying to compare like with like. As highlighted earlier, there are massive differences between the NAMLs in the length of regular-season game schedules. I am more inclined to the view that the difference in win dispersion between the NAMLs is more a reflection of the difficulties in constructing a metric that properly controls for the length of game schedules, that is, it is more a measurement problem than a “true” reflection of differences in competitive balance.

            The argument that win dispersion metrics can pick up trends within leagues but is less reliable for comparisons across leagues is reinforced by the results for performance persistence reported below in Table 2. Performance persistence measures the degree to which the final standings of teams are replicated in consecutive seasons. The length of game schedule has a much more indirect effect on performance persistence so that comparisons across leagues should be more reliable. And, indeed, we find that from the 1980s onwards the NFL has had the lowest degree of performance persistence which fits with the conventional view that the NFL has been the most proactive league in maintaining a high degree of competitive balance. Winning NFL teams face a number of “penalties” in the next season – tougher game schedules, lower-ranked draft picks and the constraints imposed by the salary cap in retaining free agents who have increased in value by virtue of their on-the-field success. It is more and more difficult for NFL teams to become “dynasty” teams which makes the Belichick-Brady era at the New England Patriots and, most recently, the success of the Kansas City Chiefs so remarkable.

            As well as the NFL, the other NAML that has managed to reduce the degree of performance persistence is the NHL which had the highest degree of performance persistence in the 1960s and 1970s but now ranks second best behind the NFL. The MLB experienced reduced performance persistence in the 1980s and 1990s ( and had, on average, lower performance persistence than the NFL in the 1990s) but that downward trend has been reversed in the last two decades. The one major league that has had no discernible trend in performance persistence over the last 60 years and has the highest degree of performance persistence is the NBA despite instituting a salary cap albeit a rather “soft” cap with a number of exemptions. The high performance persistence of basketball teams is inherent in the very structure of the game. With only five players on court for a team at any point in time, basketball is much more susceptible to the “Michael Jordan” (i.e. “super-superstar”) effect and the soft salary cap makes it easier to retain these super-superstars.

            The final set of results reported in Table 3 show how the relationship between win dispersion and performance persistence has varied over time and between leagues. One of the main motivations for this research is to determine whether or not the general presumption of a strong positive dispersion-persistence relationship is empirically valid. The evidence is mixed. There are only eight instances of a strong positive dispersion-persistence relationship (r > 0.5) out of a possible 24 which is hardly overwhelming evidence in favour of the general presumption. If medium-sized effects are included (0.3 < r < 0.5) then only half of the reported results provide support for the general presumption of a positive relationship with three strong/medium negative results and nine showing only small/negligible effects. There is one instance of a strong negative dispersion-persistence relationship in the NHL in 2010-19 indicating that reductions in performance persistence were associated with increases in win dispersion.

Competitive balance in the NAMLs has been much researched over the last 30 years. The results of our study are broadly in line with previous results but highlight that any conclusions are likely to be time-dependent and metric-dependent. The most definitive results are those on performance persistence which show a general tendency in both the NFL and NHL for improved competitive balance despite the advent of free agency. There is also clear evidence of  continuing high levels of performance persistence in the NBA, likely to be due to the super-superstar effect inherent in the game structure of basketball. As for the general presumption that win dispersion and performance persistence tend to move together in the same direction, there is no overwhelming support that they do so in most cases. The practical implication is that leagues need to be clearer on which aspect of competitive balance is most important in driving uncertainty of outcome and spectator/viewer interest. Leagues must also recognise that the structures of their sports may limit the extent to which competitive balance can be regulated. Basketball is always likely to more susceptible to super-superstar effects that can lead to high levels of performance persistence. And leagues with short game schedules may always tend to have higher levels of win dispersion since there is more limited opportunity for winning or losing streaks to even themselves out – what statisticians call the “regression-to-the-mean” effect.

Other Related Posts

Competitive Balance Part 1: What Are The Issues?

Competitive Balance Part 2: European Football

Note: The results reported in this post are published in B. Gerrard and M. Kringstad, ‘Dispersion and persistence in the competitive balance of North American Major Leagues 1960 – 2019‘, Sport, Business, Management: An International Journal, vol. 13 no. 5 (2023), pp. 640-662.

Economic Forecasting: What Is Going On?

The Times ran an editorial last Saturday (‘Predictable Mistakes’, Times, 3 Feb 2024) that was highly critical of economic forecasting particularly in the UK, pointing out that ‘among leading economies, British forecasters have distinguished themselves as the least prescient of the lot.’ A harsh assessment indeed and one with very serious consequences for all of us since, as the editorial went on to say, ‘bad modelling lays the ground for bad policymaking affecting investment strategy and monetary policy.’

            The Times editorial follows two recent columns in the Sunday Times which were also highly critical of economic forecasting. Dominic Lawson (‘Forecasts have one tiny flaw: they’re useless’, Sunday Times, 31 Dec 2023) compared economic forecasters to the augurs in Ancient Rome, a sort of priesthood distinguished by their supposed skills in predicting an uncertain future based on natural signs such as the behaviour of birds to determine whether the gods approved or disapproved of a proposed course of action. For “natural signs” read “econometrics”, but otherwise there is little difference in mindset – an overwhelming confidence, bordering on arrogance, in their superiority to the rest of us when it comes to transcending uncertainty.

              Dominic Lawson, who is the son of Nigel Lawson, the former Chancellor of the Exchequer, approvingly quoted his late father’s very perceptive comment on the fundamental problem with economic forecasting and economics in general being the illusion that because economic outcomes can be quantified, economic behaviour can be reduced to a set of mathematical equations. But, as Dominic Lawson argues, quantifiability does not mitigate the uncertainty inherent in economic behaviour. Economics is not physics; it deals with the irrationalities of economic behaviour not the behaviour of things that follow the laws of physics. And matters are made worse by the poor quality of much economic data, so much so that economic forecasters (and, hence, policymakers) are essentially flying blind. Lawson concludes his column with the rather damning comment that the time and money spent on forecasting human behaviour is a ‘monument to gullibility’. It reminds me of Deirdre McCloskey’s view that economics and econometrics are at times no more than the proverbial “snake oil”, sold by their purveyors as a cure-all but with little in the way of substantive evidence to support the marketing claims.

            Economists flying blind is the concern of Irwin Stelzer in his column, ‘Forecasting in the age of uncertainty’ (Sunday Times, 14 Jan 2024). Stelzer highlights the uncertainties in supply chains and how the interdependencies are transforming local and regional problems into global problems. It is a “butterfly effect” on a grand scale. Stelzer reminds us of the importance of Knight and Keynes as two economists who understood the difference between risk and uncertainty, and, crucially, recognised that investors fear uncertainty, not risk.

            I am reminded of a recent discussion with a senior economist of many years standing on the need for economics to embrace data analytics more thoroughly. In particular, as I have argued in recent posts, data analytics is data analysis for practical purpose and the necessary mindset for practical purpose demands a recognition of the importance of context. Although there are important differences between the approaches of Knight and Keynes (and I largely follow Keynes’s approach), both rejected the notion that uncertainty could be reduced to a well-defined probability distribution for a random process with a known, stable structure akin to the roulette wheel. The senior economist, who I would consider to be a radical economist strongly influenced by the ideas of Marx rather than modern mainstream economic theory, was very dismissive of my proposition that economics needs more data analytics. His response was that what economics needs is more sophisticated econometrics, not data analytics. Perhaps I should not have been surprised that a Marxist economist would believe in the predictability of economic forces. I suspect that Bernanke’s report on the forecasting capabilities of the Bank of England will reach a similar conclusion and argue for more sophisticated econometrics as the cure-all. But greater sophistication in econometric methods will not generate greater forecasting accuracy. Ultimately if there is no fundamental change in the mindset of economists and economic forecasters as regards the nature of uncertainty, there will be no change in the practical value of economic forecasts and policy advice. It is these issues that I intend to investigate in more detail in the coming weeks in a planned series of posts entitled ‘Risk, Probability and Uncertainty’.

Other Related Posts

Analytics and Context

Putting Data in Context

Competitive Balance Part 2: European Football

As discussed in the previous post, ‘Competitive Balance Part 1: What are the Issues?’ (24th Jan 2024), competitive balance remains an elusive concept in many ways. There is considerable disagreement over the definition and measurement of competitive balance which has generated multiple metrics. In addition, the variety of real-world nuances in the structure of sporting tournaments across different sports and different countries has exacerbated the problem as refinements to existing metrics are proposed to improve comparability across sports and countries.

Morten Kringstad and I have attempted to bring some order to the chaos by arguing that competitive balance metrics can be categorised by their timeframe and scope. In particular, as regards timeframe, competitive balance metrics either focus on the distribution of sporting outcomes of participants within a single season (i.e. win dispersion) or the degree to which to which participants replicate their level of sporting performance across seasons (i.e. performance persistence). Competitive balance metrics also differ in respect to their scope, either including all of the participants (i.e. whole league) or a subset of the strongest/weakest performers (i.e. tail outcomes).

The practical problem created by the multiplicity of competitive balance metrics is identifying which metrics should be used by league authorities in determining whether or not intervention is required to improve competitive balance. There is no general definitive empirical evidence on which aspects of competitive balance impact on gate attendances and TV viewing. There seems to be an implicit assumption that the competitive balance metrics tend to move together in the same direction, so that interventions such as centralised revenue distribution and salary caps would be expected to improve both win dispersion and performance persistence. Is this assumption valid? This is the question that Morten and I investigated in an exploratory study published in 2022 on competitive balance in European football.

Competitive Balance in European Football Leagues (EFLs)

The dataset compiled by Morten and I covers the 18 best attended, top tier domestic leagues in European football. We grouped the leagues into three groups – the Big Five (England, France, Germany, Italy and Spain), medium-sized leagues (including the Netherlands and Scotland) and the smaller–sized leagues (including Denmark and Norway). We used final league positions for ten seasons from 2008 to 2017. In the published study we reported seven alternative competitive balance metrics but found that the four win dispersion metrics were highly correlated with each other but much less so with the performance persistence metric which supports our contention of differentiating between these two types of metric. Some of the key results are reported in Table 1 below.

Table 1: Competitive Balance in European Football Leagues, 2008 – 2017

The English Premier League (EPL) stands out as the least competitively balanced of the Big Five leagues with the highest 10-year average for both win dispersion and performance dispersion. The Spanish La Liga has similar levels of competitive dominance as the EPL. In contrast, the German Bundesliga and the French Ligue 1 are the most competitively balanced. The Bundesliga has the lowest 10-year average for performance persistence across all teams. But the Bundesliga has the highest championship concentration in that period due to the dominance of Bayern Munich who won the league seven out of ten of those seasons. It is also noticeable that smaller EFLs tend to be more competitively balanced in win dispersion, performance persistence and championship concentration compared to the Big Five and the medium-sized leagues.

As regards the dispersion-performance relationship, across all 18 leagues there is a general tendency for a small positive relationship between win dispersion and performance persistence. But the dispersion-persistence relationship is highly variable across leagues especially in the Big Five. In the Spanish La Liga, which is one of the least competitively balanced leagues in our sample due to the dominance of the two global “super” teams – Real Madrid and Barcelona, there is a strong positive relationship between win dispersion and performance persistence. On the other hand, the German Bundesliga which, as highlighted above, is one of the most competitively balanced leagues despite the dominance of Bayern Munich, has a negligible dispersion-persistence relationship. The most surprising result is that for the EPL which has a strong negative relationship between win dispersion and performance persistence. The Juliper Pro League in Belgium and the Dutch Eredivisie also display a similar strong negative dispersion-persistence relationship during these ten seasons. As sporting performance becomes more dispersed across teams within a season in these three leagues, there is a tendency for sporting performance of teams to become less persistent across seasons. Perhaps this strong negative dispersion-persistence relationship is the part of the explanation of the paradox (at least in the eyes of sports economists) that the EPL is one of the least competitively balanced football leagues but remains the most commercially successful football league in the world.

What could be causing the win dispersion and performance persistence to be strongly negatively related in the EPL in defiance of the usual assumption that all competitive balance metrics tend to move together in the same direction? In our published study Morten and I develop a simple theoretical model that shows a negative dispersion-persistence relationship is more likely when there are strong persistence effects amongst the smaller teams. We suggest that the continuing growth of the value of the EPL’s media rights is putting the smaller teams in a particularly advantageous position vis-à-vis newly promoted teams and increasing the likelihood of incumbent teams avoiding relegation. And, on the other side of the coin, there is a greater likelihood of newly promoted teams becoming yo-yo teams, bouncing between the EPL and the Football League Championship.

Other Related Posts

Competitive Balance Part 1: What Are The Issues?

Financial Determinism and the Shooting-Star Phenomenon in the English Premier League

Note: The results reported in this post are published in B. Gerrard and M. Kringstad, ‘The multi-dimensionality of competitive balance: evidence from European football’, Sport, Business, Management: An International Journal, vol. 12 no. 4 (2022), pp. 382-402.