Tuesday, October 9, 2012

Election Model: Adjustments

I'm loathe to make mid-course adjustments to the model, mostly for two reasons.

First, most political news stories always seem to have a shorter half-life than it seems in the moment. Think back to 'you didn't build that,' 'put you back in chains,' Syria, and Romney's tax-returns. Think about how big a deal they seemed at the time. Now try to find them on the front page. These Big News events are rarely as big a deal as they seem in the moment. Second, the more you tinker with, fix, or adjust a model, the more that model represents your own intuition instead of a statistical prediction.

Sometimes, though, you just have to make an adjustment.

After Romney's debate win, I expected the polling would come to reflect his debate performance, and I that the model would adjust slowly. The model is designed to focus on the long picture and not jump around with every news cycle or polling shock. When something happens, like the debate, which does put a shock on polling and actually change the electoral outlook, the model should adjust and reflect the change, just a little slower than all of our intuitions. Right after the debate I did this analysis as kind of a way to say "hey, this is where I think we might be headed" and then waited for the polling to either change or not, and for the model to adjust or not.

The polling came in, and the model did not adjust as I expected. It reflected only small change in Obama's fortunes (95.3% to 94.7%) and was clearly not capturing Romney's new position in the polls. I needed to figure out why. After a some late nights, I figured out that the problem lay in the connectivity between the state and national simulations, and how to fix it.

See below for more detail, but the cliff notes are that state simulations were too independent from national polling data; this is no longer the case. I've made individual state outcomes more informed by the current national polling picture, and thoroughly tested the relationship between the two. When state and national polls disagree the model now does a much better job balancing all the available information.

Everything I said earlier still holds true (about the model adjusting cautiously, attempting to smooth out day to day variance in news cycles). The model is a forecast for Nov. 6, not a measure of the daily liberal freak-out or Romney gaffe. It's just that now when there is real movement in national polling the model will more accurately reflect it.

Read on for more detail.

Mitt Romney had very good state polls come out last Friday (FL +2 & +3, VA +1 & +3, OH +1 & -1, and CO +4) then followed that up with a mediocre Saturday/Monday and a decent-good day today. All that adds up to an incremental improvement for Mitt Romney.

The national polling, however, tells a different story. National tracking polls immediately shifted in Mitt Romney's favor and have continued to do so to the extent that the RCP average now shows him ahead in national polls:

This is the information the model was not adequately taking into account. Nate Silver is fond of saying state polls inform national polls and national polls inform state polls, and while I was allowing for influence in both directions, I wasn't doing so nearly enough. Before last Wednesday, national and state polling outcomes were in lockstep and this problem did not present itself. Now that it did I needed to find the problem and fix it. 

A few late nights later, I've figured out why the flow of information from national polling to state polling was so limited . The model simulates a national outcome based on current national polling, then simulates each state outcome relative to it. The problem lay with how the the state relativities were tied to the inputs for the national sim. If Obama's national polling number dropped, the model thought Obama would do incrementally better than before in say, Ohio, against the national outcome on account of his Ohio polling numbers not having dropped. See the following example:

See how when the Obama's national standing drops, the model doesn't drop his Ohio standing but rather assumes Ohio is that much better than national? That's the issue.

I have fixed this issue. Now when Obama's national polling number drops, the model knows that Obama's performance in Ohio probably also dropped, and combines both pieces of information when simulating an Ohio result.

Bonus graph for readers who made it this far: an updated histogram:

No comments:

Post a Comment