To investigate how to predict the outcome of a match, we investigated which statistics are good predictors of the final outcome of a match. Obviously, how many points each team scores is a good predictor (because it defines the outcome of the match). Other good indicators include inside 50s, disposals, clearances. In fact, most statistics are correlated with final margin.
But the problem we are trying to solve is not to look at match statistics and guess the margin. We don’t know the match statistics before a match is finished. So we came up with a model that applies a rating to every team and predicts a match margin. After every game, each team’s rating is updated based on how accurately the model predicted the margin.
What we want to know is how much should we trust recent performances (or, in other words, how much should we update team’s ratings every week?), how much effect does home ground advantage have? Can we account for star players going in and out of the team?
So we built two models: a team-based model, and a player-based model. The team model does not know about which players are playing, just which teams are playing and where. The player model knows everything that the team model knows, but also which players are playing for each team.
We use all match results dating back to 2000 (approximately 3000 matches). For each match, we have a full set of match statistics for each team. Using this data, and regression, classification and machine learning statistical techniques, we can work out the parameters for the model that best predict the outcomes of upcoming games of football if we only know about previous games of football. Important questions for the model include:
- How quickly do we update each team’s rating after each match? If this parameter is large, it weights recent performances more heavily, and previous performances less heavily.
- What happens between seasons? Do teams that were previously bad become a bit less bad, and do teams that are good become less good? It turns out that all teams regress to the mean between seasons.
- How important is the home ground advantage? Is there a home ground advantage if both teams are playing at a ground they are familiar with (the answers, surprisingly is “yes, a bit”). How much does travel affect the home ground advantage?
- How much does personnel affect the outcome? How much difference does a star player missing make?
How accurate is the model? As you can imagine, no model is perfect – there are natural uncertainties in a game of football (and Ray Chamberlain). But if you understand these uncertainties, you can take them into account to know how likely a team is to win or lose. Here is a graph of how accurate the model is.
Plotted on the x-axis is the error in our model. That is, how incorrect our model is (the difference between the predicted margin and the actual margin). You can see that the most common result is that our model predicts the match to within a few points. This is known as a normal distribution. It is defined by two parameters – a mean (the middle of the distribution – in our case it is near 0 points), and a standard deviation (the width of the distribution).
But on some rare occasions, the model is up to 100 points wrong! Does this make is a “bad” model? The answer is actually no. In fact, the model is useful as long as we know how often the model is wrong and by how much.
Probability of winning
Suppose we predict the home team to win by 100 points. That means the away team needs us to be wrong by 100 points if they are to win. And we know how often that happens. This particular case has happened 13/2853 times – or 0.5% of the time. So there is a 0.5% chance the away team will win. Equivalently, if we predict a team to win by 50 points, there is a 0.5% chance that they will lose by 50 points.
So when the model predicts a margin, we can also estimate how likely each team is to win.