So on to likely voter models. Finally.
I am in the process of gathering information on the likely voter screens and models used by most of the major national polls. Over the next few days, I will be posting more than you ever wanted to know about how pollsters pick likely voters. And on the eve of this effort, one major polling organization released a survey that illuminates one of the most important issues in the likely voter debate.
But first, a little background. Over the last few days, two polls released subgroup results among voters who say they will cast their first presidential ballots in this election. The Newsweek survey, released over the weekend, found that 15% of registered voters and 9% of likely voters say they will cast their first vote for president ballot in 2004. Notably, these first-time likely voters prefer John Kerry to George Bush by a 21-point margin (57% to 36%).
On Tuesday, the ABC Polling Unit found a similar number of likely voters (10%) reporting that 2004 would be their first presidential vote. They also preferred Kerry, but by a narrower margin (54% to 43%). The ABC analysis includes other helpful details: The overwhelming majority (80%) of first-time voters are under 30 years of age. They tend to be lower income, less well-educated and more often minority than repeat voters.
The ABC release also noted that Kerry’s 11-point margin among new voters is “about the same as Al Gore’s margin among first-timers, nine points, in 2000” (the 2000 result comes from exit polls). They also report that the percentage of first time voters in their sample of likely voters (10%) is roughly comparable to percentage of first timers in the 2000 exit polls (9%).
As I write about likely voter models over the next few days, one issue will be most important: Will turnout among first time voters be significantly higher this year and will the likely voter models catch such an increase if it occurs? The share of first-time voters projected by the ABC and Newsweek likely voter models is no different than in 2000, even though ABC notes in its release that “turnout overall is looking to be up: Sixty-two percent of likely voters are following the race very closely, up 20 points from this time in 2000, and Americans are four points more apt to say they’re registered to vote.”
In gathering information about likely voter models over the last few days, I have heard rumblings that some organizations are experimenting internally with tweaks to their models to accommodate the higher expected turnout. Unfortunately, the details of their efforts are mostly hidden from public view.
Then yesterday, Harris Interactive did something truly remarkable. They released data from their most recent telephone survey that included results for two different likely voter models. Collectively, these two models represent the two competing philosophies for selecting likely voters that I will be discussing over the next few days. I’ll let the Harris release speak for itself
Using one definition of likely voters, those who are registered to vote and are “absolutely certain” to vote, the poll shows President Bush with a modest two-point lead (48% to 46%). Using this definition but excluding all those who were old enough to vote in 2000 but did not do so, President Bush has a commanding eight-point lead (51% to 43%). This second definition has proved more accurate in the past, but there are some indications that in this election many people who did not vote in 2000 will turn out to vote, in which case it would be wrong to exclude them.
Harris has just provided a big clue to why some of the polls — those using the more traditional likely voter models — may be showing President Bush with a 4-6 point lead, while others show a much closer race. When their definition of a likely voter includes only those who report voting in 2000, Bush’s lead is much wider than when they include the 2000 non-voters. When their definition of a likely voter excludes those who voted in 2000, the Bush lead is mch wider than when they do not.
What Harris did today is highly unusual, because it breaks with a longstanding tradition among media pollsters of reporting a single, unshakable projection of “likely voters.” Harris is doing what internal campaign pollsters have done for years — conceding (perhaps just for the moment) that we cannot project turnout with certainty and reporting a range of potential results. Bravo to Harris!
[10/21: Typo corrected above – twice! – thanks to alert readers JB and bravenewworld. Link repaired – thanks David
Likely Voter Light
This poll is very interesting in that it shows Bush swamping Kerry, if the traditional likely voter model was applied
First off. Thank you for the inside baseball of polling.
2nd, thank you for the information on prior 1st time voters. Having a baseline figure to compare is absolutely vital.
Assuming your numbers are representative (a reach given that 10% of likely voters in a typical poll is about 80 with an MoE greater than 7%), then I posit that the extra massaging of the likely voters models is unwarranted.
Harris came clean and I know the WaPo is considering first time POTUS voters aged 18-21 (redundant, I know) as “likely”. Given the marginal differences between polling firms, I’d be stunned if the other ones aren’t doing the same thing.
My concern is why disregard conventional polling practices based on the say-so of one political party.
If polling info isn’t self-fulfilling, then there will be a lot of reputationally-besmirched polling firms.
If you choose to revise your comments on likely voter models, please discuss why many pollsters are content to publish implausibly high voter participation rates. For example, SurveyUSA’s October 19 Ohio poll found 698 “likely” voters out of 900 adults interviewed. This implies a 78% voter participation rate in a state where the actual rate has beaten 60% only twice since 1968. SurveyUSA’s October 15 Florida poll indicated a voter participation rate of 75%. No state has beaten a 75% participation rate in almost 30 years.
If this finding is wrong, whom does this polling mistake favor?
I don’t mean to pick on SurveyUSA. I praise them for releasing their number of adults interviewed, and wish it were more common.
I don’t mean to be cynical, but we can sit here analysing poll numbers, margin points and other data that will turn out o be completely wrong on election day. But I know and most people feel this deep down but just do not want to admit it, only one name is going to win this election, Diebold Systems.
Dana Milbank reported that the WaPo tracking poll likely voter screen screened out all persons over 21 who did not vote in 2000. He then remarked that Bush got 5 extra points because of this single question. I think this question is the deal breaker for the polls showing Bush with the big lead.
Companies tweak likely voter models all the time, right ? In 2000, most companies underestimated Dem turnout, especially in Ohio, where the Dem turnout was far higher than expected. In 2002, most companies underestimated Republican turnout.
The cynic in me thinks that most pollsters do a great job predicting the last election with their LV models.
As for SUSA — they use robotic dialing. Isn’t it far more likely that someone who is not intending to vote would hang up because they’re less politically inclined ?
Mark – I think you want to correct this:
When their definition of a likely voter excludes those who voted in 2000, the Bush lead is mch wider than when they do not.
to….”When their definition of a likely voter excludes those who DID NOT vote in 2000….”
Also note the typo in “mch.”
Very insightful stuff, BTW.
Is this year different?
There are two issues that are confounding poll watchers this year: [a] what are the late breakers going to do? and [b] which voters are going to turn out on election day? I’ll leave the detailed dissection of polling methodology to the
-
-
-
-
-
-
-
-
-
-
-
Is there any data on whether people who don’t vote in the first election in which they are eligible to vote are more or less likely to vote in the next election in which they are eligible to vote.
It seems right to me that people at some point settle into a pattern of voting or not voting. But it doesn’t seem right that that pattern is determined right away from the time they are eligible to vote. So I wouldn’t be surprised if some 18 -21 in 2000 who didn’t vote in 200, did vote in 2004.
But that’s just a guess. Is there any data about this?
In discussing likely voter models you may want to comment on the difference, if any, between excluding unlikely voters and including likelies.
It seems to me, for example, that screening for knowing the location of one’s polling place does the former but not necessarily the latter.
The +8 model is flawed. Why?
Because Harris controls for demographics in the first model to create a sample set that “looks like voting America”, along race, religion, age, income, etc.
In the second model, they simply remove people who happened to have had one answer to a question “voted in 2000”. If, as I suspect, this wasn’t demographically representative (and the odds of it being are low), then the second sample is demographically controlled to something completely irrelevant…a more upper class, white male world that, even keeping 2000 turnout the same, does not represent America one bit.
TYPO
2:41pm EDT: Your link to the Harris poll is incorrect — it links to the Newsweek poll.
Justin: Quick comment. Harris, like most others, weights the sample of all adults to match Census demographics for that population. THEN they select either model #1 or #2, and do not reweight either.
After publishing Jim Rutenberg’s article on polls, the NYT for some reason today published an Op-Ed by Andrew Kohut of the Pew Research Center which reinforces the misperception that variation among polls is due to “unstable voter sentiment” rather than sampling error, differing likely voter models, and different phrasing of polling questions.
Sheesh.
http://www.nytimes.com/2004/10/21/opinion/21kohut.html
I seem to have posted that last bit just as Mark was posting his new entry on the same article. Sorry.
Too much of the discussion about “likely voters” misses two important points.
1) As brought up in the Wall Street Journal recently, even the pollsters who sell “likely voters” results say “It’s an art”. Sounds like “I’m a salesman, trust me.” Too little work has been done on correlating survey answers with actual voting behavior, unfortunately.
2) More importantly: the right way to deal with differential turnout probabilities is to assign each survey respondent a probability of voting, not make a God-like “he will/he won’t” selection from the dataset. (Assigning people a 100% or 0% probability of voting based on a one-point difference in a single answer, as some pollsters do, is clearly asinine.) This weighting is easy to do with computers (if I can program it, it’s easy :-)), and it’s been done since at least 1976. There’s no excuse for eliminating “unlikely voters” from a sample of registered voters.
By the way, this also helps explain why a lot of us rely more on the “registered voters” data.
Advantages: certainly reflects reality more accurately (even a group whose members deserve a 30% likelihood is going to have lots of people who vote); also has a smaller margin of error.
Most commercial pollsters don’t understand this latter point. They also don’t calculate the confidence intervals for weighted data correctly – which should tell you something about how statistically careful and knowledgeable they are.
I’ve often thought of diving into the science of pre-Election polls, but the water always feels a little too cold for me, so I rarely get past dipping in my toes.
Therefore, I choose to not even bat an eye at the pre-Election polls. They mean absolutely nothing to me. One of the main reasons is the tiny size of the samples. I mean, 900 people representing the millions of voters in this country? How about we predict the outcome of this year’s World Series on the quality of the teams’ janitorial staff? The analogy might not totally fit, but to me, it is no less stupid.
This Harris poll shows two sets of numbers–likely voters” by the old model, those who voted in 2000 and plan to vote, and a new model, those who did not vote in 2000 but plan to vote this year.
There were 755 “old” likely voters and 820 “new” likely voters. It was not clear to me whether they were from the same population.
Bush was favored by 51% of “old” likely voters–this would fall within a range of 50.5% to 51.5%, or within a range of 389 to 398.
Bush was favored by 48% of “new” likely voters–this would fall within a range of 47.5% to 48.5%, or within a range of 381 to 389.
Thus if they are the same population, Bush gained the support of 0 of the 65 additional voters in the “new” model.
In contrast,
Kerry was favored by 43% of “old” likely voters, or within a range of 42.5% to 43.5%, or a range of 321 to 328.
Kerry was favored by 46% of “new” likely voters, or within a range of 45.5% to 46.5%, or a range of 373 to 381.
Thus if they are the same population, Kerry gained the support of between 45 and 60 of the 65 additional voters
Does this make any sense? Is the interpretation that they were the same population a rational one? If so, could this possibly be accurate where only Kerry is gaining support from new voters
Some of you may remember our experience in Minnesota in 1998: a 3 way race for governor, polls showing Jesse Ventura 3rd, then waking up to him winning with a wave of unpolled, pissed-off, first time voters.
In all this we are lacking a touchstone to the past. In 2000 the average of the polls put Bush ahead of Gore by 3%. Yet, Gore won the popular vote by 1%.
If the last possible polls on the last possible day using the ‘best model’ available can’t get it right, and consistently over-touted one side, I’m becoming skeptical.
Then I look at the self-identifieds in the current LV polls to see if they’ve corrected themselves to better refelct towards the self-identifieds in the last election and I note little remedy. There is a consistent bias toward Republicans being included in current LV polls and and against Democrats being included in LV polls.
How can we have confidence in polls where, when the underlying demographics are exposed, they are obviously skewed?
I’ll be honest, I think the LV model is seriously flawed and while they are statistically accurate for the population measured, they’re not measuring the actual population. Thus, they’re pretty much meaningless as predictors at this point in time, except to ‘possibly’ measure the maximum support Bush will get.
Comments are closed.