About Those Tracking Surveys

Divergent Polls Legacy blog posts Sampling Error The 2004 Race

So admit it. At least once a day, possibly more often, you’ve been checking in on the various rolling-average tracking surveys. Most of us are.

If you have been looking at more than one, the results over the last few days may have seemed a bit confusing. Consider:

  • As of last night, the ABC News/Washington Post poll agreed with itself and reported John Kerry moving point ahead of George Bush (49% to 48%), a three-point increase for Kerry since Friday.
  • Meanwhile The Reuters/Zogby poll had President Bush “expanding his lead to three points nationwide,” growing his support from one point to three points (48% to 45%) in four days.
  • But wait! The Rassmussen poll reported yesterday that John Kerry now leads by two points (48% to 46%), “the first time Senator Kerry has held the lead since August 23.”
  • But there’s more! The TIPP poll had Bush moving ahead of Kerry by eight points (50% to 42%) after having led by only a single point (47% to 46%) four days earlier.

What in the world is going on here?

Two words: Sampling Error.

Try this experiment. Copy all of the results from any one of these surveys for the last two weeks into a spreadsheet. Calculate the average overall result for each candidate. Now check to see if any result for any candidate on any day falls outside of the reported margin of error for that survey for either candidate. I have — for all four surveys — and I see no result beyond the margin of error.

What about the differences between two surveys? Two results at the extreme ends of the error range can still be significant. A different statistical test (the Z-Test of Independence) checks for such a difference.

So what about the difference between the 46% that the TIPP survey reported four days ago and the 42% they reported yesterday? It is close, but still not significant at a 95% confidence level

What about the three-point increase for John Kerry (from 46% to 49%) over the last three days on the that ABC/Washington Post survey? Nope, not significant either. And keep in mind, they had Kerry at 48% before, just seven days earlier.

Might some of these differences look “significant” if we relaxed the level of confidence to 90% or lower? Yes, but that means that we would likely find more differences by chance alone. If you hunt though all the daily results for all the tracking surveys, you will probably find a significant difference in there somewhere (some call that “data mining”). However, you would need to consider the following: Calculate the number of potential pairs of differences we could check given 4 surveys and 13 releases each (2 less for ABC/Washington Post) over the last 14 days (I won’t, but it’s a big number). At a 95% confidence level, you should find one apparently “significant” difference for every 20 pairs you test. Relax the significance level to 90%, and you will see “significant” yet meaningless differences for one comparison in ten.

If all of this is just too confusing, consider the following. I averaged the daily results released by all four surveys for the last two weeks (excluding October 15 and 22, when ABC/Washington Post did not release data). Here is the “trend” I get:

If all four surveys — or even three of the four — were showing small, non-significant changes in the same direction, I would be less skeptical. We could be more confident that the effectively larger sample of four combined surveys might make a small change significant. However, when the individual surveys show non-significant changes that zig and zag (or Zog?) in opposite directions, the odds are high that all of this is just random variation.

Disclaimer (since someone always asks): Yes, these are four different surveys involving different questions, sample sizes and sampling methodologies (although their methods are constant from night to night). One (Rasmussen) asks questions using an automated recording rather than interviewers. All four weight by party identification, though to different targets and some more consistently than others. So the rule book says, we should not simply combine them. Also, averaging the four does not yield a more accurate model of the likely electorate than any one alone. My point is simply that averaging the four surveys demonstrates that much of the apparent variation over the last two weeks is random noise.

Also, as with investing, past performance is no guarantee of future gain (or loss). Just because things look stable over the last two weeks doesn’t mean they wont change tomorrow.

So we’ll just have to keep checking obsessively…and remembering sampling error.

UPDATE: Prof. Alan Abramowitz makes a very similar point over at Ruy Teixeira’s Donkey Rising.

Correction: The original version of this post wrongly identified the TIPP survey as “IBD/TIPP.” While TIPP did surveys in partnership with Investor’s Business Daily earlier in the year, the current tracking poll is just done by TIPP. My bad – sorry for the error.

Mark Blumenthal

Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.