When Toledo upset then #18 Arkansas (in Arkansas!), it shocked
the college football world. The game was supposed to be a little warm-up for Arkansas
before their demanding SEC schedule. Arkansas plays in the toughest conference,
they’re perennially ranked, and they were expected to mop the floor with Toledo
while using the game to prep for their gauntlet of an SEC season. It didn’t
happen. Toledo looked awesome and escaped Fayetteville with a shocker.

The game was a huge surprise for my college model as well.
(Side note: I need a name for my college football model, I’m taking
suggestions). The model expected Toledo to lose by 22, and instead they won by
4, a 26-point error.

On the other hand, Georgia Tech’s victory that week (over
Tulane) wasn’t surprising. Georgia Tech was expected to crush Tulane, and they
did. They won by an impressive 55 points, scoring 65 points to Tulane’s 10. Looking
at points scored, though, the model was actually more wrong about the Georgia
Tech win than it was about the Toledo win. The model expected Georgia Tech to
win by 28, and they won by 55, a 27-point error.

Errors like these, differences between what the model
expects for a game and the actual outcome, are way the model updates a team’s
rating from week to week. When a team performs better than the model expects, its
rating moves up, and when a team performs worse, its rating moves down. This should
make sense, yet you would all judge me as totally nuts if I adjusted Georgia
Tech’s rating as much as I adjusted Toledo’s rating based on those week 2
results. You would be right, that’s not how I do it. Instead, I use a concept I
call Posterior Win Probability (PWP), and that’s the subject of this post.

Posterior Win Probability uses a combination of statistics
and historical betting data to calculate a football outcome that is like margin
of victory, but with diminishing value for winning by bigger and bigger margins.
Using PWP, winning by four is

__much__better than losing by 22 (Toledo), but winning by 55 is only a little better than winning by 28 (Georgia Tech). This allows the model to adjust team ratings that rely on margin of victory, but in a way that’s heavily weighted toward meaningful outcomes.
Posterior Win Probability is what allows the core part of my
college football model to be so effective, yet so simple. Once I calibrate the
ratings pre-season, the only additional required input is a team’s result from
week to week. From there I can predict all the game outcomes for the whole
season and use that to predict standings, bowl games, and CFP selection. PWP
has been part of the model for a long time, and I think it’s such a cool idea. I’m
thrilled to finally share it.

But first, let’s take a step back.

Historical line data is available for every college football
game that’s been played for the last 10ish years. You can see what the line
was, whether the team won, as well as (in the data set I found) what a bunch of
other predictive models thought the line should be (i.e. Sagarin, Massey –
spoiler alert, none of them had better predictive power than the lines).

I used that data to calculate how likely a team was to win a
game based on the pre-game betting line. For example, teams that were 0-point
favorites (pick’em) won basically half the time, teams that were 7-point
favorites tended to win around 70% of the time, teams that were 21-point
favorites tend to win around 93% of the time, you get the idea.

The graph below demonstrates this correlation.

10 years of data seems like a lot of data, but most
statisticians would laugh at only 9,262 games. As the “smallness” of the data
set shows in this graph, it’s bumpy. This bumpiness won’t do if we’re going to
use this outcome going forward. For example teams that are favored by 4 have
won 64% of the time, but teams favored by 4.5 have only won 60% of the time. This
is just noise at these low volumes of games, and we need to smooth it out. We
need to fit a parametric curve to this data, one that makes sense and that we
can use going forward. Luckily, I still have the actuarial textbook that taught
me how to do maximum likelihood estimation*, so I dusted it off and fit a
logistic curve to the data. Now it looks like this:

This “smoothed out” version is driven by a simple formula
that’s easy to use and produces logically consistent results. This is what I
use right now to translate between win probabilities and point spreads.

My model’s raw output is a team’s probability of winning. I
compare the opposing team’s ratings and use a formula to calculate win
probability based on how those ratings compare to each other. I like to keep my
focus on a team’s probability of winning; I find that to be the interesting
output. When I bet on sports, I strongly prefer to bet on teams to win rather
than cover the spread. After all, we root for our teams to win. No fan ever
says “thank god we only lost by 3” or “darn, we didn’t win by more than 7.5.”

However, it’s important to take the model’s win probability output
and translate that into a point spread, since that’s the metric that is most
commonly used when discussing upcoming games. For example, we all know that
Alabama is favored by 7 over LSU this week, but almost none of us know that
Alabama has a 70% chance to win this week.

Point spreads are more intuitive to me, too. Every week I
use point spreads, not win probabilities, to compare my model’s predictions to
Vegas predictions. As an example, this week my model thinks Washington should
be a 1-point underdog vs. Utah, while Vegas has UW as a 2-point favorite. Having
both predictions in point spreads, in that common frame of reference, makes the
comparison easy.

Translating between win probability and point spread is
important, and the logistic function above is how I do that. I take a team’s
probability of winning and find what point spread it matches up with on that
logistic formula.

To return to Toledo vs. Arkansas, the model gave Toledo a
mere 6% chance to win that game. Finding 6% on the graph below, we can see it
matches up with 22.6 on the point spread axis.

Looking at the model’s prediction, I can calculate their
likelihood to win the game beforehand (it was 6%), and I can translate that
into an expected points outcome (losing by 22.6) using the process we just
discussed. Comparing this prediction to the outcome, I can say they won, and that
they won by 4, but doing that gives me no way to evaluate the error on my “6%
chance to win” prediction.

I could improve on “they won” by saying they now have a 100%
probability of winning.

That also feels inelegant because doing so assigns Toledo
(won by 4) and Georgia Tech (won by 27) the same value.

The answer to both these quandaries is Posterior Win
Probability.

Recall that I calculate a team’s expected points outcome
(lose by 22.6 for Toledo) by mapping their win probability onto that logistic
curve. I can also go the other direction, as in the graph below.

Toledo won by 4. On the logistic curve, winning by 4 matches
up with a win probability of 61.8%, so I say Toledo’s Posterior Win Probability
is 61.8%. That is the number I use to measure the model’s error on that game, as
well as adjust Toledo’s (or anyone’s) team rating. Notice how the logistic
curve is steep between -14 and +14 and flattens out the further you get from
zero. This means that huge losses and huge wins are nearly plateaus because
they don’t dramatically impact the likelihood of a team actually winning.
However, when an outcome is close, it’s on the steep part of the curve.

Let’s return to the comparison of those week 2 games:

- Toledo vs. Arkansas
- Georgia Tech vs. Tulane

The points-based errors are almost the same, but the percentage-based
errors are miles apart. They are as apart from each other as American Football
is from European football. This is how it should be. Toledo’s outcome was a
huge shocker, while Georgia Tech’s outcome was merely a good team running over
a bad team. Posterior Win Probability does an excellent job articulating differences
between what is expected and what actually happens.

The very different percentage-based errors in these two
games led to very different adjustments to the team’s ratings:

Toledo’s team rating went up by nearly 25%, and Georgia Tech’s
rating barely budged. This is the elegance of Posterior Win Probability: it uses margin of victory but gives
diminishing marginal value for additional points scored, to the point that
blowing out a cupcake team by 70 instead of 50 means almost nothing, but
winning by 21 when you were expected to win by 1 makes a huge difference.

This is exactly how Posterior Win Probability is supposed to
work. Huge losses and huge wins fall on the nearly flat part of the logistic
curve and aren’t strongly differentiated from one another. Conversely, close
losses and close wins are treated with more significance.

Now that you know everything there is to know about
Posterior Win Probability, I hope you’re as enthused about it as I am. PWP is a
core part of my college football model, as well as this year’s new and improved
CFP model, and I’m excited that I can now reference it in my writing and
analysis. I’m also eager to explore additional potential uses. More to come!

*maximum
likelihood estimation is a process of estimating the parameter(s) of a given
statistical model. You can read way more about it here

So what formula would that curve represent? For example, if the spread were -3 what formula could you plug that into to get a winning percentage?

ReplyDelete