When Toledo upset then #18 Arkansas (in Arkansas!), it shocked the college football world. The game was supposed to be a little warm-up for Arkansas before their demanding SEC schedule. Arkansas plays in the toughest conference, they’re perennially ranked, and they were expected to mop the floor with Toledo while using the game to prep for their gauntlet of an SEC season. It didn’t happen. Toledo looked awesome and escaped Fayetteville with a shocker.
The game was a huge surprise for my college model as well. (Side note: I need a name for my college football model, I’m taking suggestions). The model expected Toledo to lose by 22, and instead they won by 4, a 26-point error.
On the other hand, Georgia Tech’s victory that week (over Tulane) wasn’t surprising. Georgia Tech was expected to crush Tulane, and they did. They won by an impressive 55 points, scoring 65 points to Tulane’s 10. Looking at points scored, though, the model was actually more wrong about the Georgia Tech win than it was about the Toledo win. The model expected Georgia Tech to win by 28, and they won by 55, a 27-point error.
Errors like these, differences between what the model expects for a game and the actual outcome, are way the model updates a team’s rating from week to week. When a team performs better than the model expects, its rating moves up, and when a team performs worse, its rating moves down. This should make sense, yet you would all judge me as totally nuts if I adjusted Georgia Tech’s rating as much as I adjusted Toledo’s rating based on those week 2 results. You would be right, that’s not how I do it. Instead, I use a concept I call Posterior Win Probability (PWP), and that’s the subject of this post.
Posterior Win Probability uses a combination of statistics and historical betting data to calculate a football outcome that is like margin of victory, but with diminishing value for winning by bigger and bigger margins. Using PWP, winning by four is much better than losing by 22 (Toledo), but winning by 55 is only a little better than winning by 28 (Georgia Tech). This allows the model to adjust team ratings that rely on margin of victory, but in a way that’s heavily weighted toward meaningful outcomes.
Posterior Win Probability is what allows the core part of my college football model to be so effective, yet so simple. Once I calibrate the ratings pre-season, the only additional required input is a team’s result from week to week. From there I can predict all the game outcomes for the whole season and use that to predict standings, bowl games, and CFP selection. PWP has been part of the model for a long time, and I think it’s such a cool idea. I’m thrilled to finally share it.
But first, let’s take a step back.
Historical line data is available for every college football game that’s been played for the last 10ish years. You can see what the line was, whether the team won, as well as (in the data set I found) what a bunch of other predictive models thought the line should be (i.e. Sagarin, Massey – spoiler alert, none of them had better predictive power than the lines).
I used that data to calculate how likely a team was to win a game based on the pre-game betting line. For example, teams that were 0-point favorites (pick’em) won basically half the time, teams that were 7-point favorites tended to win around 70% of the time, teams that were 21-point favorites tend to win around 93% of the time, you get the idea.
The graph below demonstrates this correlation.
10 years of data seems like a lot of data, but most statisticians would laugh at only 9,262 games. As the “smallness” of the data set shows in this graph, it’s bumpy. This bumpiness won’t do if we’re going to use this outcome going forward. For example teams that are favored by 4 have won 64% of the time, but teams favored by 4.5 have only won 60% of the time. This is just noise at these low volumes of games, and we need to smooth it out. We need to fit a parametric curve to this data, one that makes sense and that we can use going forward. Luckily, I still have the actuarial textbook that taught me how to do maximum likelihood estimation*, so I dusted it off and fit a logistic curve to the data. Now it looks like this:
This “smoothed out” version is driven by a simple formula that’s easy to use and produces logically consistent results. This is what I use right now to translate between win probabilities and point spreads.
My model’s raw output is a team’s probability of winning. I compare the opposing team’s ratings and use a formula to calculate win probability based on how those ratings compare to each other. I like to keep my focus on a team’s probability of winning; I find that to be the interesting output. When I bet on sports, I strongly prefer to bet on teams to win rather than cover the spread. After all, we root for our teams to win. No fan ever says “thank god we only lost by 3” or “darn, we didn’t win by more than 7.5.”
However, it’s important to take the model’s win probability output and translate that into a point spread, since that’s the metric that is most commonly used when discussing upcoming games. For example, we all know that Alabama is favored by 7 over LSU this week, but almost none of us know that Alabama has a 70% chance to win this week.
Point spreads are more intuitive to me, too. Every week I use point spreads, not win probabilities, to compare my model’s predictions to Vegas predictions. As an example, this week my model thinks Washington should be a 1-point underdog vs. Utah, while Vegas has UW as a 2-point favorite. Having both predictions in point spreads, in that common frame of reference, makes the comparison easy.
Translating between win probability and point spread is important, and the logistic function above is how I do that. I take a team’s probability of winning and find what point spread it matches up with on that logistic formula.
To return to Toledo vs. Arkansas, the model gave Toledo a mere 6% chance to win that game. Finding 6% on the graph below, we can see it matches up with 22.6 on the point spread axis.
Looking at the model’s prediction, I can calculate their likelihood to win the game beforehand (it was 6%), and I can translate that into an expected points outcome (losing by 22.6) using the process we just discussed. Comparing this prediction to the outcome, I can say they won, and that they won by 4, but doing that gives me no way to evaluate the error on my “6% chance to win” prediction.
I could improve on “they won” by saying they now have a 100% probability of winning.
That also feels inelegant because doing so assigns Toledo (won by 4) and Georgia Tech (won by 27) the same value.
The answer to both these quandaries is Posterior Win Probability.
Recall that I calculate a team’s expected points outcome (lose by 22.6 for Toledo) by mapping their win probability onto that logistic curve. I can also go the other direction, as in the graph below.
Toledo won by 4. On the logistic curve, winning by 4 matches up with a win probability of 61.8%, so I say Toledo’s Posterior Win Probability is 61.8%. That is the number I use to measure the model’s error on that game, as well as adjust Toledo’s (or anyone’s) team rating. Notice how the logistic curve is steep between -14 and +14 and flattens out the further you get from zero. This means that huge losses and huge wins are nearly plateaus because they don’t dramatically impact the likelihood of a team actually winning. However, when an outcome is close, it’s on the steep part of the curve.
Let’s return to the comparison of those week 2 games:
- Toledo vs. Arkansas
- Georgia Tech vs. Tulane
The points-based errors are almost the same, but the percentage-based errors are miles apart. They are as apart from each other as American Football is from European football. This is how it should be. Toledo’s outcome was a huge shocker, while Georgia Tech’s outcome was merely a good team running over a bad team. Posterior Win Probability does an excellent job articulating differences between what is expected and what actually happens.
The very different percentage-based errors in these two games led to very different adjustments to the team’s ratings:
Toledo’s team rating went up by nearly 25%, and Georgia Tech’s rating barely budged. This is the elegance of Posterior Win Probability: it uses margin of victory but gives diminishing marginal value for additional points scored, to the point that blowing out a cupcake team by 70 instead of 50 means almost nothing, but winning by 21 when you were expected to win by 1 makes a huge difference.
This is exactly how Posterior Win Probability is supposed to work. Huge losses and huge wins fall on the nearly flat part of the logistic curve and aren’t strongly differentiated from one another. Conversely, close losses and close wins are treated with more significance.
Now that you know everything there is to know about Posterior Win Probability, I hope you’re as enthused about it as I am. PWP is a core part of my college football model, as well as this year’s new and improved CFP model, and I’m excited that I can now reference it in my writing and analysis. I’m also eager to explore additional potential uses. More to come!
*maximum likelihood estimation is a process of estimating the parameter(s) of a given statistical model. You can read way more about it here