Lessons: Likely Voter Models & Bias

Divergent Polls Legacy blog posts Likely Voters

On Wednesday morning after the election, still a bit groggy, I cobbled together a quick table of the final surveys released by the major national polling organizations. I calculated an overall average (showing a Bush margin of 1.7%), compared that to Bush’s actual 3.0% margin and concluded, “the traditional likely voter models performed reasonably well.”

A few hours later I received an email from a loyal reader, Prof. X: “I’d take issue with your view that the average of the polls is close,” he wrote. “Mostly, I think this is dead wrong.”

When Prof. X tells me I’m “dead wrong” I tend to pay close attention because he is way ahead of me in statistical training and smarts generally. So I decided to take a second look. A slightly updated version of the table appears below.

A few minor notes: This table includes the final survey for each organization rather than “projections” that allocate the undecideds. In one case (Zogby) I used the next to last release of data that still had undecideds included. Also, I have reported only one set of numbers for ABC/Washington Post, whose final numbers were identical. Finally, I have intentionally omitted two surveys by Harris and the Economist that were drawn from Internet panels. I will take up the lessons from projections and Internet polls separately.

One of Prof. X’s complaints was that by including three surveys fielded a full week before the election (by ICR, Time and the LA Times), I “artificially made [the average] look better than it is.” Fair enough. This time I included a separate average based only on surveys done in the final week that brings the average Bush margin down slightly to 1.5%, roughly half that of Bush’s latest margin of the popular vote (2.9%).

I have to admit that my conclusion that the margin was “reasonably close” would have been the same had I calculated it at 1.5%.  I was thinking the way any pollster does when they compare their final poll to reality. A three point Bush win (51% to 48%) is within +/-3-4% sampling error of a poll giving Bush a one-and-a-half-point lead (49% to 47.5%).

However, that margin of error applies to only to one survey at a time.  As Prof. X pointed out, the same logic does not apply to an average of 15 or more polls. In fact, most of the surveys done in the final week (11 of 15) had a “bias” toward Kerry – they showed Kerry ahead or had the margin closer than three points. That result cannot be explained by random variation: My application of the binomial distribution puts the probability of that happening by chance alone (assuming that undecideds broke evenly) at roughly 6%.

So why did so many national polls show the race closer than it turned out to be?

1) Undecideds “broke” two-to-one towards the incumbent George Bush. I’ll concede that the incumbent rule (under which undecided voters on the final survey typically break to challengers) did not hold in this year’s presidential race, but I’m dubious of a break towards Bush. In the national exit poll, those who made their final vote choice in the last four days went for Kerry 53% to 44%, while those deciding earlier went for Bush (52% to 47%). The same pattern held in Florida and Ohio. The average of the national tracking surveys also showed no trend to Bush over the final weekend; if anything, they had Kerry running slightly closer.

2) Some likely voter models worked better than others – When I presented the details on likely voter models, I noticed that the pollsters that used a variant of the Gallup likely voter model showed Bush doing consistently better than other surveys. That difference now looks prescient. The following table shows the results of those using the Gallup likely voter model either in the final week (Gallup, Pew, Newsweek) or in the final two weeks (adds Time and the LA Times). In both cases, the Gallup-model showed a Bush margin closer to the actual result (3.2%+) than the average of the other surveys (0.9%). There were three surveys in the “other” category that correctly forecast Bush’s final three-point margin (notably, Pew, TIPP, ICR), but the other 10 showed Kerry doing slightly better.

3) National pollsters manipulated their results – consciously or unconsciously – to show a closer result – One cynical reader wrote just before the election to suggest what he called “price fixing” at the national level: He argued that some national surveys were intentionally “’managing’ (or massaging) the data to have it come out Bush +1 or Bush +2.”

Crazy? Perhaps, but as the same reader correctly anticipated, the small Kerry bias in the national surveys did not appear at the state level. RealClearPolitics provided averages of final polls in 18 battleground states. Kerry did better than the RCP average in nine states while Bush did better in eight states – about what we would expect by chance alone.

4) Respondents were more supportive of Kerry than non-respondents – This possibility would be easier to dismiss had not the exit polls shown a similar deviation in Kerry’s favor. The exit pollsters are speculating that non-response bias may be the reason.

Which explanation is most convincing? The truth is probably some combination of #2-4 above, and each topic is worthy of further investigation.

I am wary of picking individual “winners” among the national polls, as any one survey can hit or miss the precise final result by chance alone. Having said that, the most obvious lesson in all of this is that the Gallup likely voter model as employed collectively by Gallup, Pew, Newsweek and others came closer to predicting the outcome in the final week than the average of the other surveys. The notable exceptions, TIPP and Battleground, are also worthy of further discussion, since both weighted their surveys by party ID. More to come on that…

(My apologies to those who viewed an intially garbled version of this post between 8:30 and 9:00 a.m. EST.  My weblog host, Typepad, has chosen to implement new “features” without much of a beta test.  Very frustrating)

Mark Blumenthal

Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.