In the last post I covered how the Gallup likely voter model works. In this post, I want to review criticisms of the model.
An Imperfect Predictor? One complaint about the Gallup model and its progeny is that they do not perfectly predict likely turnout. Some real voters get classified as “unlikely” – some non-voters are deemed “likely.” The creators of the original Gallup model did not promise that their model could make a 100% accurate classification only that selecting a subgroup of the likely voters sized to match the likely turnout level provides the most accurate read on the outcome. Keep in mind that although the mechanics of the model have been in use for more than 40 years, the methodologists that apply it review how well the model worked after every election. Since they typically ask vote questions of all registered voters, they can look back after each election and check whether alternative models would have predicted the outcome more accurately. If Gallup continues to stick with the model, it is because they believe it continues to work as well or better than any alternative.
As described in the last post, the original Gallup models were based on validation studies that obtained the actual vote history for respondents. This process was relatively easy when pollsters interviewed respondents in-person, as their names and addresses easily obtained. Conducting a validation study on a random digit dial (RDD) telephone interview, requires that the respondent provide their name and address to the pollster. And records are dispersed in thousands of clerks offices and databases across the country. So such studies are now rare and difficult.
In 1999, the Pew Research Center conducted such a validation study of the Gallup likely voter model, using polls taken during the Philadelphia mayor’s race. They were able to obtain actual voting records for 70% of their respondents. The Pew report is worth reading in full, but here are the two key findings. On the one hand, the likely voter model was far from perfect in predicting individual level voting: .
Using [the Gallup likely voter] index, the Center correctly predicted the voting behavior of 73% of registered voters…. The 73% accuracy rate means that 27% of respondents were wrongly classified — those who were determined as unlikely to vote but cast ballots (17%), or non-voters who were misclassified as likely to vote (10%).
On the other hand, the Pew report showed that the results of the likely voter sample came closer to predicting the very close outcome, as well as the preferences of those in the sample who actually voted, than the broader sample of registered voters.
A longer academic paper based on the study summed up the conventional wisdom accepted by most public pollsters: “Though it is impossible to accurately predict the behavior of all survey respondents, it is possible to accurately estimate the preferences of voters by identifying those most likely to vote.”
One important limitation: The Pew study involved a low-turnout, off-year mayoral election, where the difference between the size of the electorate and the pool of registered voters was large. In a high turnout presidential elections in which 80% or more of registered voters cast ballots, there is typically less difference between registered and likely voters.
Too Much Volatility? The most common complaint directed at Gallup’s likely voter model is that it seems to yield more volatile results than other polls. In 2000, Gallup’s daily tracking surveys showed dramatic swings. On October 2, for example, they reported a dead heat between George Bush and Al Gore among likely voters (45% to 45%). Four days later following the first debate, they had Gore suddenly ahead by 11 points (51% to 40%). Four days after that, Bush was ahead by eight (50% to 42% — see the chart prepared by Gallup). Other polls taken over the same period showed nowhere near as much change, and Gallup’s own registered voter samples were more stable.
While Gallup dropped the daily tracking program this year, they have continued to show more volatility than other surveys. For example, they had Bush ahead by fourteen points (54% to 40%) in mid-September, had Kerry ahead by a single point after the debates (49% to 48%) and now have Bush leading again by six (51% to 46%).
An article in the current issue of Public Opinion Quartely presents evidence that the volatility resulted mostly from changes in the composition of the Gallup likely electorate. In other words, the volatility resulted less from a changing opinions than from changes in the people that Gallup defined as a likely voters. Authors Robert Erikson, Costas Panagopolouos and Christopher Wlezien analyzed the raw Gallup data from 2000 available in the Roper Center Archives. They compared likely voter non-likely voters and found that trend lines moved in opposite directions over the course of the campaign. The concluded that “most of the change (certainly not all) recorded in the 2000 CNN/USA is an artifact of classification,” and that the shifts resulted from:
The frequent short-term changes in relative partisan excitement…At one time, Democratic voters may be excited and therefore appear more likely to vote than usual. The next period the Republicans may appear more excited and eager to vote. As Gallup’s likely voter screen absorbs these signals of partisan energy, the party with the surging interest gains in the likely voter vote. As compensation, the party with sagging interest must decline in the likely voter totals. [The full text is available here]
Although Gallup has not formally responded to the Erikson, et. al. study, the methodologists at Gallup do not quarrel with the basic finding. Gallup’s Jeff Jones recently told the New York Times:
We’re basically trying to get a read on the electorate as of the day that we’re polling,” said Jeffrey Jones, managing editor of the Gallup Poll, “not necessarily trying to predict what’s going to happen on Election Day itself.
Jones frames the key question perfectly. Most pollsters agree that a pre-election survey is no more than a snapshot of opinions of the moment, but what about the people in the sample? As Erikson put it in an email to me last week, do we want surveys to identify those who are likely to vote on Election Day or those who are likely to vote “if the election were held today?”
Gallup’s answer is to let the composition vary. My view, and the view of most of my colleagues who poll for political candidates, is that we need to impose controls to keep the composition of the likely voters as constant as possible. However, those controls require making subjective decisions about what the likely electorate will look like on Election Day. Some weight by party (like Zogby and others). Others stratify their sample regionally to match past vote returns (like Greenberg/Democracy Corps and Fox/Opinion Dynamics) – an approach I prefer. However, supporters of the Gallup model argue that both alternatives pose a greater risk of imposing past assumptions on an unknown future.
I think those compromises are worthy. but I am a producer. You are consumers. How would you answer Erikson’s question? If you have a strong feeling, enter a comment below.
I’ll take that up one last complaint about the Gallup model in the next post.
[Mispelling of Erikson corrected]
It strikes me that the actual vote is determined by two things: the preference of the population and the turnout.
The Gallup model uses poll questions to determine both. Other organizations seem to keep their assumptions about turnout reltatively fixed and only us the poll to determine shifts in overall preference.
Unless you believe that turnout doesn’t change, it strikes me as silly to assume that turnout is fixed and only model preference shifts. It actually makes sense that as partisans for one side or the other get enthused that they are more or less likely to turn out.
I know that makes for more dramatic swings in the numbers, but isn’t that just reality?
If at the beginning of the cycle you knew what turnout would be then it would make sense to fix that element and only poll on preference. Given that turnout shifts, I don’t know how accurate you can be if you fix that element…
Thanks for the great discussion on this issue. I do think that using the LV poll questions increases the predictave value of the poll but callign them Likely voters makes people think that Gallup is trying to predict exactly which voters are going to vote on election day. In reality they are simply polling a subset of registered voters who historically are shown to be more likely to vote than the other subset of voters. I think simply shifting from calling them likely voters to “most likely voters” would make the differene stronger in peoples minds. Then they would understand that the Most likely voters pool would simply be like polling strong support vs weak support. They would understand that it isn’t a prediction so much as a generality over which sides supporters have traditionally followed through with their intentions to vote. I just think Gallup would come under less criticism because people would understand that their most likely voter pool would miss people who traditionally haven’t voted in the past.
As a consumer, I would like to think I can get better information from following Gallup or another pollster than I can get from following Tradesports or the Iowa Electronic Market.
I think a regional calibration is probably best, because it will most likely capture across the board feelings of voters. I don’t care about the race being held today. I know the race is going to be held on the first Tuesday after the first Monday. That’s what I want to see reflected in the polling.
That said, at least Gallup is willing to take the criticism and open itself up more than the other pollsters. That gives Gallup, in my mind, a certain degree of credibility the others do not have.
I am just so glad that people are talking intelligently about differences in survey methodology, esp re: post-stratification weighting or the peculiar data set censoring done by Gallup — I mean, who else deliberately throws away data? Jeff Jones’ quote is spot on, essentially asking “what is the research question.” But then you’ve got a second stage meta-analysis question — how can one incorporate Gallup results into a predictive model? Since it’s capturing local peaks and valleys with a one week delay, if there’s a surge at the very end then Gallup will be less biased (in a statistical not ideological sense) as a predictor; if there’s no surge or one in a countervailing direction then the bias is greater. So you’d have to model likelihoods of late breaking surprises and thus, to what extent is Nov 2 more like July and to what extent more like early September? Reminds me of the Bayesian critique of the interpretation of confidence intervals; or this quote from an FAO study on fisheries: “It is worth noting that scientific research has generally underestimated uncertainty, even in the relatively well-understood physical sciences. Henrion and Fischhoff (1986) and Freudenberg (1988) have examined the history of parameter estimates in several fields, including measurements such as the speed of light , and found that confidence intervals were frequently too narrow and that subsequent estimates often fell outside of previously published confidence intervals.” (html not enabled? My name now links to that study)
As a consumer that click on RCP for new poll results like a crack addicted lab rat pushing on his little lever, my answer is that I want it both ways.
I want a “stable likely” number to compare with a “current likely” so I can look at the trend in the current newscycle measured against the longer term number.
Of course, when the race is within the margin of error for so long what good are polls anyway? A move of +- .5% is lost in the sampling error….
I agree that pollsters should take into account respondents’ likelihood of voting. But I find it very hard to see the advantage of Gallup’s “cutoff” method as opposed to weighting respondents smoothly by their estimated voting probability, as some other polls apparently do. The cutoff method will in general incur a statistical bias in favor of more energized bases. For example, let the nation be 50/50 red/blue and Reds have a 55% but Blues a 45% likelihood of voting, then the election will end 55/45 – but assuming that Gallup manages to correctly pick the most likely voters, their cutoff method will predict 100/0. This easily generalizes into statistical statements for less radical examples. A smooth weighting would furthermore mean that you effectively work with a larger sample. See also the current policy brief on Charles Manski’s, of Northwestern University, homepage (disclaimer: he’s my thesis advisor).
Joerg
How about a compromise?
Take the DCorps, Fox/OD, et al model or something similar as your base, since, while the future won’t necessarily mirror the past, it will, most likely resemble it.
Look at past voting trends and determine what are the reasonable levels of fluctuation of turnout by group. Say, if a particular group is totally jazzed about the election turnout might rise by a max of 5% (arbitrary numbers). Say another group is a little excited, they get an extra 2%, the third group, though, is pretty unenthused, hating both candidates, they drop 3%. So then, assuming that all groups were originally considered equally likely to vote at 55% turnout, the first group would be weighted for a 60% turnout, the second for a 57% turnout, the third for a 52% turnout … most groups, naturally, would generally be at a fairly neutral level of enthusiasm.
Seems to me, this would give you the best of both worlds, allowing for enthusiasm to show through, while forcing the model to restrict itself to a reasonable snapshot of the electorate.
I’m with Joerg/Manski. Smooth weighting seems to solve both the arbitrary cutoff problem, and the volatility problem.
Devin: how do you know that the excitement captured in an October 10 Gallup Poll translates into an actual increase in voter turnout on Election Day?
Second, re: the regional weighting scheme: it seems like this would miss significant electoral shifts over a Presidental cycle — such as a surge in new registrations, population movements (Las Vegas, etc), or “coattail” effects of having other candidates / issues on the ballot that impact that particular election turnout. Maybe when you average out all the micro-errors it actually has good predictive value?
If the margin of error can go from 2 to 3 to even 5%, is it more than just the qualifying aspects of the model? How about the survey sample sizes – aren’t they too small? Even with the different weighting of respondents? I don’t see that in any of the discussions I’ve been reading on this blog or others. (By the way, this is an excellent blog, thank you for it.)
I’m not a pollster but do use statistics in direct marketing planning and analysis. As a consumer, it’s just hard to look at a Gallup poll, with it’s fluctuations, and feel informed.
Is it too difficult to do weekly surveys with 5-10 thousand respondents? Too costly?
“not necessarily trying to predict what’s going to happen on Election Day” is a cop out. not to say there aren’t immediate motives for the data, but suggesting it’s all about a snapshot is a half-truth.
the idea that as one party gets excited, their voter-likeliness increases, makes sense, but i don’t know if i accept the notion that people would infact turnout in those ratios if the election were held that day. i’d suppose that on election day excitement evens out, or at least settles to a level more generally representative of partisan interest in that particular election. the same way that undecideds finally make up their minds, the truly likely voters vote. striving to get a clearer idea of who they are all along the process makes the most sense to me. party id and regional stratifying both seem flawed but reasonable, and a better compromise. are there other methods?
Come on now. We really do know what the question is we are trying to answer. That is: Who is going to win? Tell me now tell me now because I dont want to wait to find out.
Having said that, a poll is ALWAYS a snapshot of the present because that is when it is happening.
As for methodological differences, isnt the proof always in the performance? There is no a priori reason to prefer your weighting scheme over Gallup’s except that YOU think it is more reasonable. So, what you are saying is that YOUR JUDGEMENT is an ingredient in deciding what the “correct” model is. As an economist, I know that this is always the case – pretending that you “know” the correct model is something that is true only in textbooks and coin tosses. So, though it may be impure, I am happy enough to say (since there is no such thing as true purity) that Gallup is nonsense when they predict a fall in both Democratic and black turnout compared to the Republicans in November 2004. Why is this so? Because all sentient humans in the reality based universe know it to be true. So, I adjust my opinions accordingly. The pollster who was most willing to do that last time around also was the one closest to the truth in the end – Zogby is “impure” if anyone is, but he nailed it last time. However, one lucky guess (if that is what it was) doesnt make me a true believer forever, so I will continue to look at all of the results and draw my own conclusoins. Isnt that exactly what you are going to do?
“As Erickson put it in an email to me last week, do we want surveys to identify those who are likely to vote on Election Day or those who are likely to vote “if the election were held today?””
That is easy.
I want both, with which they are clearly identified. Every picture tells a story (don’t it?).
Steve Kyle wrote:
There is no a priori reason to prefer your weighting scheme over Gallup’s except that YOU think it is more reasonable. So, what you are saying is that YOUR JUDGEMENT is an ingredient in deciding what the “correct” model is.
Steve: Are you saying that there is no basis on which to make analytical, informed judgements as to the soundness of a particular polling method or methodology? That everything is subjective and each opinion is as sound as any other? This sounds really anti-science and anti-knowledge. Surely you don’t believe this.
One could combine smooth probability or propensity weights with the Gallup method via simulation: use the entire Gallup sample, and run repeated draws using the predicted probability of voting for each case. This creates a stochastic “n” for each “election,” yielding a bootstrapped average result and variance estimate. You could also vary the participation model to estimate the impact of model assumptions on the results and uncertainty of the estimate.
Just a thought.
As a consumer, the single piece of methodology that I would most appreciate would be an accounting, with demographics, of who did and didn’t participate, through either contact or cooperation bias (though, with technology to screen incoming calls, not clear it makes sense anymore to try and differentiate these two). I say that because, if we concede that the answers subjects give to the questions that score them out as likely, vs merely registered, voters change with their mood, shouldn’t we also worry that their propensity to cooperate (and willingness to let themselves be contacted) will also change with their mood? That partisans of one side or another have a differential propensity to cooperate (and be contacted)?
To take the concern for a Hawthorne effect past self-selection of subjects, and to the next level, how sure are we that subjects are not systematically spoofing the demographics to influence the poll results? As public awareness has grown that pollsters adjust their sample of likely voters to normalize it to some theory of what the electorate will look like, might they not take it into their heads to pretend to be of different demographics than they truly belong to? If I were a rich, white Republican who wanted to puff up Bush’s numbers, wouldn’t the best way to do that, if asked by a pollster, involve, not just truthfully saying I intend to vote for Bush, but also untruthfully claiming to be poor, black and a hitherto lifelong Democrat? Any checks against this?
anthony stevens:
Sure there are ways to make analytical informed judgements about methodology. Anyone who works with statistics does this all the time. I had two points:
1. In this particular case there is no a priori reason to favor one over the other – there are legitimate arguments on both sides and the ultimate choice must be based on which performs better
2. In spite of point 1, we should not pretend that subjective judgement has no role either. It ALWAYS does. As informed humans, we know that there are a myriad of factors that in reality affect the result we are looking for, but our models always abstract from this complex reality to focus on only a few of these factors. However, anyone who has spent a lot of time constructing statistical models knows that sometimes you can construct a model with a great logical basis, using only the soundest statistical methods, and then get a result which makes you wince and say “Bullshit”. That is the point you go back to the drawing board and either try another method (which when you describe it in the methods section of your paper sounds methodologically pure, but really isnt because it is a response to your cry of bullshit) or engage in some ad hoc fix which gets the result closer to something you can believe.
Hence my example of Gallup. There is nothing in the abstract that makes their methodology wrong. What is wrong about it is that when I see them predicting that Democratic turnout will FALL (relatively) in this year when every Dem I know is frothing at the mouth, I say “Bullshit”. Anything we do after that point to either switch methodology or to engage in adhockery is an exercise in injecting subjectivity into the analysis.
But I am all for science. Just not mindless science. After all, the result we are looking for is the right answer, isnt it? (and we will know who got closest in a week)
Anthony Stevens:
You could and probably should adjust the regional turnout expectations somewhat for shifting populations. If in 2000 30 million people lived in a region and 18 million voted but in 2004 35 million people live there, we should probably not expect only 18 million of them to vote.
You don’t know that the people who are jazzed on 10/10 will still be jazzed on 11/2. There’s no way of knowing this, sans the acquisition of a crystal ball … and not just any old crystal ball, the kind that tells the future. All any poll can tell you is what would happen if the election were held today. I think something along these lines would do the best job of it. In your analysis, you can then look behind the numbers to see how much of a candidates’ support comes from the “base 55%” and how much is based on excitement. You could even report both numbers if you wanted
Strange how Gallup results always seem to benefit the GOP and almost always differ significantly from other polls. Maybe Gallup is using its “polls” to justify fixed voting machine results?
Steve Kyle was proved wrong as Bush votes did increase in 2004 vs 2000 by more than did Kerry votes increase vs Gore. Despite anecdotal evidence that Democrats were frothing the GOP still turned more people out. This doesn’t validate the Zogby weighting approach (in fact his final 2004 polls was heavily incorrectly biased towards the democrats).
I do agree though that gallup’s 100%/0% approach seems less sensible than Joerg’s waiting approach. Gallup’s approach excludes some data entirely which is never a good idea
More on that USA Today/Gallup Poll
Our update to the Slate Election Scorecard yesterday tries to put the results for the generic congressional vote from the USA Today/Gallup survey into some perspective. It also reintroduces the controversy over likely voter models in generalwith…