Monday, August 1, 2016

Histograms and State Polls

The model has moved a little towards Clinton, largely driven by a couple of good national polls and one showing her +9 in Pennsylvania.


I haven't found a good way to put this up on the left with the other data viz, so for now I'll just post it. There's a column for every single number of electors votes. There's one column for 268, one for 269, for 270, etc. The height of each column indicates how likely that particular outcome is.

State Polls
Reading the methodologies of other election models, it sounds like it's fairly common to use LOESS smoothing to calculate a poll average. There's a good explanation of LOESS in this article, but basically it's a way to draw a weighted average line through data over time.

I aggregate polls in a slightly different way. Every poll at its heart is simply a bunch of people indicating a preference for one candidate or another. My methodology at its simplest is to add up what they said. I only use polls from RCP to minimize the risk of bias in poll selection, since as a Clinton voter, I might be more inclined to notice polls that are good for her.

It's not quite that simple, but it's close. Before I add them, I adjust the poll for in-house biases using 538's pollster ratings, and I discount polls by age. It works like this:

  1. Adjust raw outcome of a poll to adjust for bias. For example, if the in-house bias is R+2, I would adjust by taking one point from Trump and adding it to Clinton.
  2. Multiply the adjusted poll percentage by the sample size to get implied votes. In a poll of sample size 1000, where Clinton got 45%, that would translate to 450 implied votes.
  3. Reduce the implied votes based on the poll's age.
At the end of this process, I've turned a bunch of polls into a single Big Poll, that can then be used* in other math, like using a logistic function to figure out how likely that Big Poll correctly predicts the winner.

Below is an example of what I'm talking about (a handful points if you can guess which state - should be easy). Polls are adjusted for bias, then turned into implied votes based on polling age, and then added up!

*after adjusting for national polls


  1. Do you buy the argument that Trump does not have a path to victory without Pennsylvania?

    1. I mean, he's got a path without PA, it's a tough one though. I just ran a sim and calculated this: Trump has a 69% chance to win if he wins PA, and a 9% chance to win if he doesn't.

  2. I'm guessing your example is from Kansas since that is where Ft. Hays State is.

    1. Good guess, but it's Kansas's neighbor to the east (Missouri)