More Divergent Than They Should Be?

Picking up where I left off on the last post, let’s start with the basic theory of random sampling. If we draw a series of perfect random samples, the results for any question will show a very predictable variation from survey to survey. The pattern of variance for any given result should resemble a “normal” or bell shaped curve: Some percentages will be higher, some lower, but most will cluster near the true center value. The “margin of error” is a translation of the normal curve into probabilities numbers. The shape of the curve means there are many different margins of error depending on how certain we want to be. If we drew repeated samples of 1000 interviews, for example, 95% (19 of 20) would get a result falling within +/- 3.1% of the value for the entire population; 80% certainty (16 of 20 surveys) would fall within +/- 2%, and half the surveys (10 of 20) should fall within +/- 1.1%.

Apples to Apples

Now let’s look at some real data. The table below shows results of 11 national surveys of self-described “registered voters” conducted since the Republican convention. For the sake of argument, let’s assume that voter preferences have not changed one iota over the last two weeks (unlikely), that all of the surveys used identical methodologies and question wordings (far from it), that each is a perfect random sample of registered voters (hardly) and that each poll surveyed 1,000 registered voters (some were lower). Let’s also make the leap that average of all surveys represents the “true” preferences of all registered voters

				Registered Voters			Bush
	Date	N=	+/-	Bush	Kerry	Nader	Margin
IBD/TIPP	9/14-18	894	3.3%	43%	42%	2%	1%
CBS/NYT	9/12-16	1,088	3.0%	50%	41%	3%	9%
Gallup/CNN/USAT	9/12-15	935	4.0%	50%	42%	4%	8%
Pew	9/11-14	1,002	3.5%	46%	46%	1%	0%
ICR	9/8-12	868	3.3%	48%	44%	3%	4%
Pew	9/8-10	970	3.5%	52%	40%	1%	12%
Newsweek	9/9-10	1,003	4.0%	49%	43%	2%	6%
Time	9/7-9	1,013	3.0%	50%	39%	4%	11%
AP-IPSOS	9/7-9	1,286	2.5%	51%	43%	2%	8%
ABC/WashPost	9/6-8	952	3.0%	50%	44%	2%	6%
CBS	9/6-8	909	3.0%	49%	42%	1%	7%

Averages
Average – All		993		49%	42%	2%	7%
Sept 11-18		957		47%	43%	3%	5%
Results from PollingReport.com and Rasmussenreports.com

We have 11 polls, and therefore 22 estimates for either Kerry or Bush. Based on chance alone, we would expect 95% of the estimates (roughly 21 of 22) to fall within a range of +/-3%; that’s a range of 46% to 52% for Bush and 39% to 45 for Kerry. As the table shows, two estimates (in bold) are worse – by chance alone we should have seen only one.

Further, we would expect 80% of these estimates, (18 of 22) to fall within a range of +/-2%; that’s between 47% and 51% for Bush and 40% and 44% for Kerry. On the table, 4 estimates (highlighted) fall in that range – exactly what we would expect by chance alone.

It’s also worth noting that the key outliers in this exercise – the most recent Pew and IBD studies – were conducted most recently and narrow the average of Bush’s lead slightly (to 5%). If the race has really gotten a few points tighter, then both surveys would fall within an the expected range for the narrower result.

Thus, given all the various differences in timing, methodology, question wording, and so on, the variance of survey results falls remarkably close to what we would expect by chance alone. For registered voters at least, a population that is essentially comparable across surveys, the disparity has been mostly about sampling error.

Likely Voters – Apples to Bananas

Of course, the polls of “likely voters” are the ones getting the blame for showing divergent results. Pollsters have good reason for trying to identify likely voters. In the year 2000, the U.S. Census estimated 203 million Americans of voting age, 130 million of whom were registered to vote. Of these, 105 million (80% of registered voters and 52% of adults) cast a ballot. Thus, if we want an accurate forecast, we theoretically want to interview only those 50-60% of adults who will actually vote in November.

The problem, as noted by much of the recent coverage, is that we lack an obvious way to identify truly likely voters, especially since respondents tend to exaggerate their likelihood to vote. Here, pollsters use widely varying methods to identify likely voter. No two likely voter screens are created equal.

Consider the data. The table below, which includes results from 16 recent surveys, shows that while polls of likely voters do show a bit more of a spread than we would expect by chance alone, they do not deviate wildly. Again if we assume hypothetically that all of these surveys are comparable, involve perfect random samples, and that the “true” result equal to the overall average of all of these surveys, , then we would expect 95% of the estimates (30 of 32) of Bush or Kerry’s vote to fall within a margin +/-4%. That amounts to a range of 46% to 52% for Bush and 39% to 45 for Kerry. The actual polls do slightly worse, with 4 of 32 (indicated in bold) falling outside these limits.

				Likely Voters			Bush
	Date	Likely	+/-	Bush	Kerry	Nader	Margin
Zogby	9/17-19	1,066	3.1%	46%	43%	1%	3%
Rasmussen	9/17-19	3,000	2.0%	49%	45%	2%	4%
IBD/Tipp	9/14-18	650	4.0%	45%	42%	2%	3%
Gallup/CNN/USAT	9/12-15	767	4.0%	54%	40%	3%	14%
Pew	9/11-14	725		47%	46%	1%	1%
Democracy Corps	9/12-14	1,003	3.1%	47%	45%	3%	2%
Harris	9/9-13	803	4.0%	47%	48%	2%	-1%
NDN/Penn Schoen	9/9-12	800	3.5%	49%	44%	3%	5%
ICR	9/8-12	758	3.6%	51%	44%	3%	7%
Pew	9/8-10	745		54%	38%	2%	16%
Time	9/7-9	857	4.0%	52%	41%	3%	11%
AP-IPSOS	9/7-9	899	3.5%	51%	46%	1%	5%
Zogby	9/8-9	1,018	3.1%	46%	42%	2%	4%
Democracy Corps	9/6-9	1,004	3.1%	48%	45%	4%	3%
FOX	9/7-8	1,000	3.0%	47%	43%	3%	4%
ABC/WashPost	9/6-8	700	3.5%	52%	43%	2%	9%

Averages
All		1,007	3.4%	49%	43%	2%	6%
Sept 11-18		1,032	3.4%	48%	44%	2%	4%
Results from PollingReport.com and Rasmussenreports.com

Among those falling within the 95% range, the spread is still wider than expected by chance. At an 80% confidence level, for example, we would expect roughly 6 of 32 estimates to fall outside the range of +/-2%; that’s between 47% and 51% for Bush and 41% and 45% for Kerry. On the table, 11 estimates (highlighted) fall outside that range.

So the main point: The differences between polls, especially the methods of identifying likely voters, do create differences beyond sampling error, but those differences are small, perhaps just a few percentage points. Much of the variation still comes from random chance.

Note one more telling point: Contrary to the conventional wisdom, the overall averages since early September show Bush with essentially the same margin among all the polls of likely voters (+6) as among the polls of registered voters (+7) since early September. The same patterns holds, with slightly narrowed margins, over the last 10 days.

[Continue this series with, “So What Should a Junkie Do?”]

3 thoughts on “More Divergent Than They Should Be?”

ron says:

September 24, 2004 at 12:01 pm

“we lack an obvious way to identify truly likely voters, especially since respondents tend to exaggerate their likelihood to vote.”
No kidding. You might also have added that respondents’ likelihood of voting is probably influenced by a number of factors that can change over time, making today’s “likely” tomorrow’s “possible”.
Moreover, I’m not sure that anyone can get a handle on true likelihood of voting without a screener that is more extensive than most telephone survey respondents will put up with.
K. says:

September 25, 2004 at 10:23 pm

I’m obviously not a pollster, or even a statistician, so this may explain this lame question … but why the heck is it “telling” that Bush has similar percentages from registers and likely voters. What does this mean? Does it foretell more voters this election cycle — as likely voters morph into registered? Does it mean that they are sampling poorly (can’t tell the difference between them)?
I’m just lost. Please, someone, help me out.
K.
Mark Blumenthal says:

September 26, 2004 at 11:59 pm

Sorry K. I’m finding my blog style, and I assumed I’d have time to come right back to that point.
In short: I believe the most important number to follow in any election involving an incumbent is the incumbent’s support, specifically whether it is over or under 50%. The fact that Bush is hovering just at or below 50% nationally and in most of the key battleground states suggests that we’re probably heading for another very close race. This conclusion comes more from the art than science. I’ll try to explain in more depth in the next day or so

Comments are closed.

More Divergent Than They Should Be?

More Stories

MysteryPollster Is Back! (Sorta)

Another ‘Phantom Swing’? Investigating Differential Nonresponse in 2018

MysteryPollster is Back!