One more puzzle I can help demystify: Those ever so slightly variant numbers from the Washington Post and ABC News, the numbers based on the same data. I’ll quote the Post‘s methodology page:
The Post and ABC News collect data jointly but are responsible for developing their own methods to identify likely voters. This may produce slightly different estimates of candidate support.
A lot of observers have noticed the numbers reported by the Post and ABC vary slightly, though rarely by more than a single percentage point on any given day. Also, the variation occurs for both the likely voter and registered voter samples. The explanation above (and the similar text on the ABC methodology page) seem to imply that ABC and the Post differ only in the way they treat likely voters. This odd anomaly has left many of us scratching our heads. Tom Silver, publisher and editor of The Polling Report, saw the small differences and decided to report both sets of numbers separately. Before that, some emailers angrily asked him why he chose to endorse one “version” over the other.
Then last night, after getting a tip from a very well informed reader, I noticed that the Washington Post methodology page had recently changed. Before, the page described a likely voter model vaguely similar to the more restrictive Harris model I described yesterday: Registered voters who say they are likely to vote this year and either report having voted in 2000 or were not old enough in 2000. Now, the page describes a more complex model using a large number of variables (similar to the model described on the ABC methodology page):
The Post uses seven variables to define likely voters, including whether the respondent states they are registered to vote, their intention to vote, past voting history, interest in the presidential campaign, age, whether the respondent is voting for the first time in 2004 and whether the voter knows the location of his or her polling place. These variables produce a sample of likely voters that is largely composed of individuals who regularly vote in presidential elections but does include newly registered as well as other first time voters. In a typical sample, about one in 10 likely voters are self-described first-time voters and one in six are between the ages of 18-29.
Feelings thoroughly mystified, I decided to abandon email and do something bloggers are not supposed to do. I used the telephone. Ultimately, Richard Morin, the Post’spolling director, graciously took five minutes out of his busy day to answer a few questions. (Give him credit for this: I am technically a partisan, and many in his position simply refuse to answer calls from a party pollster to avoid even the appearance of favortism or collusion).
So here’s the story: The Post and ABC, like most major public polling organizations, tinker with their likely voter models slightly as the election draws near. A few weeks ago, according to Morin, the Post made their likely voter screen a bit tougher — to bring it into line with a model representing roughly 56-58% of the voting age population. As is often the case, the change had little effect on the overall numbers, though they did not get around to updating the methodology page until yesterday. The Post also started weighting its daily likely voter sample by party identification as described on their web page:
The Post adjusts, or “weights,” each day’s randomly selected samples of adults to match the voting-age population percentages by age, sex, race, and education, as reported by the Census Bureau’s Current Population Survey. The Post also adjusts the percentages of self-identified Democrats and Republicans by partially weighting to bring the percentages of those groups to within three percentage points of their proportion of the electorate, as measured by national exit polls of voters in the last three presidential elections.
To cut through the technical discussion: The ABC and Washington Post likely voter models and weighting procedures are slightly different at the margins, but are both performing the same task more or less the same way. But what about the differences among registered voters?
Here it is: ABC weights their likely voter sample by party identification, but tabulates results from the full sample of registered voters separately without weighting by party. The Post weights the likely voters by party, and then rolls those weighted interviews together with the rest of the unlikely, unweighted-by-party, registered voters to get their registered voter sample.
Whew. If you follow all that, you probably scored quite high on your math SATs.
If you didn’t, know this: Average the ABC and Washington Post vote numbers since October 3, and you see virtually no difference resulting from their dueling weighting strategies. Among likely voters since early October, both have Bush ahead of Kerry by the exact same 50% to 47% margin. Among registered voters, the numbers are off by a single percentage point each way. Bush leads by three points (49% to 46%) on ABC’s registered voter tabulation and by one point (48% to 47%) on the Washington Post’s. Rounding may explain much of the difference. There may be an issue of how the ABC/Washington Post models differ from other organizations, but their internal differences are trivial.
Nonetheless, there is a lesson here in the “art” of political polling. Talk to 20 pollsters and you’ll get 20 different ways of selecting likely voters. A reader-pollster (let’s call him “X”) looked at this way:
You think the polls are confusing to you? Washington Post can’t even agree with it’s partner, ABC News, on how to weight. The NYT can’t agree with its partner CBS News about what to put in the lede. And when it comes to defining LV’s, Harris can’t even agree with….Harris. No wonder poll consumers are confused.
Exactly.
The Washington Post and ABC’s methods appear to closely resemble Gallup’s, in terms of the questions used to determine likely-voter status. It should thus not surprise us that, like Gallup, the Post and ABC polls have a fairly substantial Bush lead (as opposed to the dead heats in Zogby, NBC/WSJ, Pew, and Marist, and Kerry lead in AP-Ipsos).
The description above says the Post “partially” weights on party ID to bring the Republican and Democratic compositions in their samples to within 3 percentage points of the exit polls from recent presidential elections; I take that to mean that if the proportions of R’s or of D’s in their samples is 4 or more points away from what past exit polls would suggest, the Post adjusts the samples to limit the discrepancies to no more than 3 points.
That can still lead to serious discrepancies. In both 1996 and 2000, 39% of the electorate consisted of Democrats (based on the exit polls), whereas the GOP’s share of the electorate has been either 34 or 35 for the last three presidential elections. Presumably, the Post will allow Democratic representation to fall as low as 36 and GOP representation to rise as high as 38.
For further elaboration, please visit my sample-weighting website at:
http://www.hs.ttu.edu/hdfs3390/weighting.htm
With a +/-3% window, the WaPo allows a range of 36-42% for Democrats and 32-38% of Republicans in the party i.d. weightings.
What would be interesting is what their numbers have shown in aggregate so far. Are they consistently showing a Dem result on the lower end of their range? A result on the higher end of their range for Republicans? (This is the presumed position for Democrats who want to soothe their troubled minds.)
The most recent Detroit News tracking poll has a sample of 40.9% Republican and 40.6% Democrats. As a Blue state, one would expect Democrats to be more prevalent. Is this a sampling error in favor of Republicans or is a a real move towards Republicans?
Poll by party identification
There is a very obvious conclussion that can be drawn. The tracking poll is clearly under-representing Democrats.
A mitigating factor may be involved though.