What a perfect time to talk about how national and state polls interact in my model. The short answer is that national polls can be thought of as 50 individual state polls, and modeled accordingly.
Figuring out the right way to incorporate national polls into the model has been very difficult. Not just for me, but I'm sure I've read Nate Silver write in 2012 that his model treated national polls "holistically."
My approach to national polls in 2012 was to calculate a poll average of national polls, then in each simulation of the election, simulate a national outcome then vary state outcomes relative the national outcome. It worked well enough, and obviously the result was great, but it lacked elegance.
This election I've improved on that technique substantially, and my inspiration for how to do so came from thinking about what polls really are - a collection of individual preferences. National polls are a collection of those preferences spread out across 50 states.
That's the key to how national polls are handled, so I'll say it again: national polls are a collection of preferences spread across 50 states. They can be modeled as such.
Just like for state polls, I collect every poll from the RCP average, then aggregate them using the same methodology I described on Monday. Once I have that average, I apportion it out to the states using population and adjusting for how red or blue the state is on a fundamental level.*
*I do this using Cook PVI, which you can read all about here
To demonstrate, let's return to my Missouri example from Monday. My national poll average was Clinton 45.5%, Trump 42.9%, with an effective sample size of 44,855 voters. To turn that data into something specifically for Missouri requires 3 additional steps:
- Adjust the national poll to reflect Missourian political leanings
- Calculate how much of the national poll sample came from Missouri
- Combine steps one and two to estimate how the national polling translates to actual voters expressing preferences in Missouri
Next is to combine national and state polling. Easy! Just add up the votes.
I add the state poll totals to the national poll totals, and that's my aggregate poll. This is the poll I use to calculate the candidate's chance of winning, to simulate elections, to categorize the state, and so on.
The national poll showing Clinton +8 had a sample size of 12,742 voters. Needless to say it's shaken things up a bit. Here's how Missouri looked after this poll was included:
It added 245 Implied Missouri votes, increasing Clinton's total from 367 to 481, and Trump's from 387 to 494. Because the poll was favorable to Clinton, it decreased her deficit in Missouri from 6.4% to 5.8%, and increased her chance to win the state from 10% to 12%.
When a big national poll is released it can move a lot of states. This one created exactly the same movement as if the following state polls were all published on the same day:
- 800-person poll in FL showing Clinton +6%
- 500-person poll in PA showing Clinton +9%
- 400-person poll in GA showing Clinton +2%
- 400-person poll in NC showing Clinton +5%
- and so it goes, all the way down to a 25-person poll in VT showing Clinton +24%, and a 23-person poll in WY showing Trump plus 22%
Big national polls can tell us a lot.
To sum up, national polls interact with state polls in the following way:
- National polls are aggregated
- The result is adjusted for each state according to its PVI
- The PVI-adjusted national poll is apportioned state-by-state using population share
- Those apportioned national votes are added to the aggregate state poll to create a final aggregated poll for each state