This week, the hard working folks at The Hotline, the daily online political news summary published by the National Journal, did a remarkable survey of pollsters on the question of how they check their samples for accuracy. The asked virtually every political pollster to answer these questions: "What’s the first thing you look at in a survey to make sure it’s a good sample? In other words — what validates a poll sample for you?" They got answers from six news media pollsters and thirteen campaign pollsters (including MP and his partners).
Now, MP readers are probably not aware of it, since few can afford a subscription to Washington’s premiere political news summary, but The Hotline has been closely following MP’s series on disclosure of Party ID. In fact they have reproduced much of the series almost in full, with all the requisite links (for which we are appropriately grateful). In fact, we believe it was the Washington Post polling director Richard Morin’s reference to party identification as a "diagnostic question" in his answer to MP that inspired The Hotline‘s poll of pollsters about how they validate polls. Thus, we asked the-powers-that-be at the National Journal to grant permission to reproduce their feature, and they have kindly agreed.
The responses of the various pollsters are reproduced on the jump page. Thank you National Journal!
(Under the circumstances, MP doesn’t mind shameless shilling for The Hotline, especially since he is a regular reader: Their "Poll Track" feature is one of the most comprehensive and useful archives of political polling results available. It’s just a shame they don’t offer a more affordable subscription rate to a wider audience).
As the Hotline editors note, most pollsters listed a series of demographic and attitudinal questions that they tend to look at in evaluating a poll (particularly gender, age, race and party ID). However, a few themes deserve some emphasis:
- A point worth amplifying: Several pollsters – especially Pew’s Andrew Kohut, ABC’s Gary Langer and Democrat Alan Secrest – stressed that the procedures used to draw the sample and conduct the survey are more important to judgments about quality than the demographic results.
- Ironically, there was a difference in the way the pollsters heard and answered the Hotline’s questions (next time, pre-test!): Some described how they evaluate polls done by other organizations; some (including MP and his partners) described how we evaluate our own samples.
- Although most described what factors they look at, few went on to describe what they do when a poll (either theirs or someone else’s) fails their quality check. Do they weight or adjust the results to correct the problem? Leave it as is, but consider the errant measure in their analysis? Ignore the survey altogether?
There is much food for thought here. Like the Hotline editors, MP would like to know what you think. After reading over the comments below, please take a moment to leave a comment: Of all the suggestions made, what information do you want to know about a poll? More broadly, what other questions would you like to ask the pollster consultants about how we do our work?
See the complete responses from pollsters as published in yesterday’s Hotline on the jump page.
GOPers
- MWR Strategies (R) pres. Michael McKenna: "From a technical
perspective, we look at demographic breakouts (age, gender, region, etc.) and
make sure they are in the ballpark. Then we look at ideological/partisan breaks
… Then finally, and perhaps most importantly, we look at the responses
themselves and sort of give them the real-world test — does that set of answers
conform to the other things I know about the world? … It seems to me that the
trick to samples is not being excessively concerned about one set of survey
results, but rather to look at the results of all surveys on a given topic. It
is pretty rare for a single deficient sample to twist the understanding of an
issue or event, in part because everyone (I think) in the business looks at each
other’s results. - Hill Research (R) principal David Hill: "I always look at the joint
distributions by geography, party, age, gender, and race/ethnicity. Based on
prior election databases, we know the correct percentage of the sample that
should be in each combination of these categories. … We try to achieve the
proper sample distribution through stratified sampling and the imposition of
quotas during the interviewing process, but sometimes it still isn’t right
because of quirks in cooperation rates, forcing us to impose weights on the
final results." - Ayres, McHenry & Associates (R) VP Jon McHenry: "When we get data
from the calling center, the first thing I check is the racial balance and then
party ID. Variation in either of those two numbers can and will affect the other
numbers throughout the survey. Looking at someone else’s survey, … I’ll also
see how long the survey has been in the field. You can do a good survey in two
days, but it’s tricky. It’s pretty tough in one day, which is part of the reason
tracking nights can bounce around. … But … knowing you’re in the ballpark
with party id is a pretty good proxy for seeing that you have a balanced
sample." - Public Opinion Strategies (R) partner Glen Bolger: "If it’s a party
registration state/district, I check party registration from the survey against
actual registration. I also look closely at ethnicity to ensure proper
representation of minorities. We double check region and gender quotas to make
sure those were followed. We check age to ensure seniors are not
overrepresented. - Probolsky Research (R) pres. Adam Probolsky: "If the poll is about a
specific election, I look at whether the respondents are likely voters. If not,
it hard to take the results seriously. If it is a broader public policy or
general interest poll, I look to see if the universe of respondents matches the
universe of interested parties, stated more plainly, that the population that is
being suggested as having a certain opinion is well accounted for in the
universe of respondents." - Moore Information (R) pres. Bob Moore: "Name of pollster and
partisanship of sample."
Media
- Gallup editor-in-chief Frank Newport: "Technically there are a wide
variety of factors which combine to make a "good sample." As an outside observer
… I focus first and foremost on the known integrity and track record of the
researcher/s involved. If it’s a research organization unknown to me, the "good
sample" question becomes harder to answer without more depth investigation —
even with the sample size, target population, dates of interviewing information
usually provided. Parenthetically, question wording issues are often more
important in evaluating poll results than the sample per se." - ABC News dir. of polling Gary Langer: "A good sample is determined
not by what comes out of a survey but what goes into it: Rigorous methodology
including carefully designed probability sampling, field work and tabulation
procedures. If you’ve started worrying about a "good sample" at the end of the
process, it’s probably too late for you to have one. When data are completed, we
do check that unweighted sample balancing demos (age, race, sex and education)
have fallen within expected norms. But this is diagnostic, not validative." - Schulman, Ronca & Bucuvalas, Inc. pres. Mark Schulman: "When
you’ve been in the polling business as long as I have, you’ve learned all the
dirty tricks and develop an instinct for the putrid polls. ‘Blink blink,’ as
Gladwell calls it. Blink — who sponsored the poll, partisans or straights?
Blink — are the questions and question order leading the respondent down a
predictable path? Blink — does the major finding just by chance (!) happen to
support the sponsor’s position? This all really does happen in a blink or two.
The dirty-work usually does not involve the sample itself." - Pew Research Center pres. Andrew Kohut: "There is no one way to judge
a public opinion poll’s sample. First thing we look for was whether the sample
of potential respondents was drawn so that everyone in the population had an
equal, or at least known chance of inclusion. Secondly, what efforts were made
to reach potential respondents — Were there call backs — how many — over what
period of time? And what measures were used to convince refusals to participate?
… How does the distribution of obtained sample compare to Census data? We will
also look at how the results of the survey line up on trend measures that tend
to be stable. If a poll has a number of outlier findings on such questions it
can set off a warning bell for us. I want to add that the major source of error
in today’s polls is more often measurement error than sampling error or bias.
When I see a finding that doesn’t look right, I first look to the wording of the
question, and where the question was placed in the questionnaire. The context in
which a question is asked often makes as much difference as how the question is
worded." - Zogby Int’l pres. & CEO John Zogby: "I go right to the harder to
reach demographics. Generally, that means younger voters, Hispanics, African
Americans. They are usually under-represented in a typical survey sample, but if
their numbers are far too low, then the sample is not usable. I also look at
such things as union membership, Born-Agains, and education. If any of these are
seriously out of whack then there is a problem." - Research 2000 pres. Del Ali: "There are two things right off the top:
the firm that conducted the survey and for whom it was conducted for. If it is a
partisan firm conducted for a candidate or a special interest group, the
parameters and methodology become critical to examine. However, regardless of
the firm or the organization who commissioned the poll, the most important
components to look for are: Who was sampled (registered voters, likely voters,
adults, etc.), Sample size/margin for error (at least 5% margin for error),
Where was poll conducted (state wide, city, county, etc.), What was asked in the
poll (closed ended/open ended questions), When was a horse race question asked
in the poll. Bottom line, I take all candidate and policy group polls with a
grain of salt. The independent firms who poll for media outlets are without
question unbiased and scientifically conducted."
Dems
- Greenberg Quinlan Rosner Research (D) VP Anna Greenberg: "It is hard
to tell because you never really know how people develop their samples."
Mentioning sample size, partisan breakdown and field dates, Greenberg looks "for
… how accurately it represents the population it purports to represent (e.g.,
polls on primaries should be of primary voters). … You can also look at the
demographic and political (e.g., partisanship) characteristics to make sure the
sample accurately represents the population. It is rarely reported, but it would
be helpful to know how the sample frame is generated (e.g., random digit dial,
listed sample) so you can get a sense of the biases in the data. But none of
these measures really help you understand response rates, completion rates or
response bias, which arguably have as big an impact on sample quality as any of
the items listed above. It is important to note that ALL samples have bias, it’s
just a matter of trying to reduce it and understand it." - Anzalone-Liszt Research (D) partner Jeff Liszt: In addition to
mentioning the importance of sample size, Liszt looks at "over how long (or
short) a period the interviews were conducted. Very large samples taken in one
or two nights sometimes raise a red flag because of the implications for the
poll’s call-back procedures. The challenge is that public polling is only very
selectively public. Sampling procedure and weighting are critical, yet opaque
processes, about which very few public polls provide any information. … This
often leaves you with little more than a smell test. … Often, the best you can
do is consider whether a poll is showing movement relative to its last reported
results, whether other public polls are showing the same movement, and whether
there is any apparent shift in demographics and party identification from
previous results." - Global Strategy Group (D) pres. Jefrey Pollock: "The first question
we ask ourselves is ‘what is driving voter preference?’ In an urban race like
NYC or LA, race is frequently the primary determinant, and therefore the most
important element of ensuring a valid sample. In addition, there is a high
frequency of undersampling of minorities in many surveys. In an election where
race is not a leading determinant, we look first to ensure that the survey
matches up to probable regional turnout." - Cooper & Secrest Associates (D) partner Alan Secrest: "Proper
sampling is the absolute bedrock of accurate and actionable polling. … A
correctly-drawn poll sample — in concert with properly focused screen questions
(the two cannot be divorced…especially in primary polling) — should yield a
representative look at a given electorate. Sadly, such methodological rigor too
often is not the case." Nothing that "there is no ‘single’ criterion" to use in
judging a poll, Secrest does point out the importance of testing dates and
demographic, Secrest lists his key criteria: "Does the pollster have a track
record for accurate turnout projection, winning, and being willing to report
unvarnished poll results to the client?; what turnout model was used to
distribute the interviews?; is the firm using an adequate sample size for the
venue, especially if subgroup data is being released?; ‘consider the
source’…some voter list firms are perennially sloppy or lazy in the
maintenance of their product; were live interviewers used? centrally located?
appropriate accents? Obviously, question design and sequence matter as well." - A joint response from all the partners at Bennett, Petts & Blumenthal
(D), Anna Bennett, David Petts and Mark Blumenthal: The
most valid data we have on "likely voters" involves their geographic
distribution in previous comparable elections. Like most political pollsters, we
spend a lot of time modeling geographic turnout patterns, and stratify our
samples by geography to match those patterns. We also look at whatever
comparable data is available, including past exit polls, Census estimates of the
voting population, other surveys and voter file counts. We examine how the
demographic and partisan profile of our survey compares to the other data
available, but because there are often big differences in the methodologies and
sources, we would use these to weight data in rare instances and with extreme
caution." - Hamilton Beattie & Staff (D) pres. David Beattie: "There is not
‘one thing’ that validates a poll — the following are the first things we
always look at: 1)what is the sample size, 2)what were the screening questions,
3)what is the racial composition compared to the electorate and were appropriate
languages other than English used, 4)what is the gender, 5)what is the age
breakdown (looking especially to make sure it is not too old) 6)what is the
party registration or identification." - Decision Research (D) pres. Bob Meadow: "The first thing we do is
compare the sample with the universe from which it is drawn. For samples from a
voter file, we compare on party, gender and geography. For random digit samples,
we compare on geography first."
© 2005 by National Journal Group Inc., 600 New Hampshire Avenue, NW, Washington
DC 20037. Any reproduction or retransmission, in whole or in part, is a
violation of federal law and is strictly prohibited without the consent of
National Journal. All rights reserved.
I can speak for a lot of conservatives when I say that party ID is the #1 factor when judging polls. NY Times and LA Times tend to have polls that skew toward heavier Democrat representation (e.g., Schwarzenegger poll). When we see 40% Democrats and 28% Republicans in their samples, we generally think their polls are flawed (very rarely we will see the ratio showing more Republicans).
Secondly, when we discuss various polls at conservative websites, we tend to look for weekend, Thursday night football and other similar effects that cause more Democrats to be present in the sample. Fox/Opinion Dynamics really hurt themselves by doing polls in the weekend eve of the 2004 election.
Lastly, most people bring up the past track record of the pollsters. Zogby obviously gets negative remarks while Battleground and SurveyUSA and Rasmussen get high ratings. Gallup in 2004 also took a tumble after their decision to allocate undecides voters to Kerry and with their various flawed state poll results.
Cableguy, why would you think a state that is heavily Democratic would produce a polling distribution anything other than a Democratic majority? You are looking for patisan bias when that is only one of many potnetial sampling problems, and depending on the topic, not even the most important one.
The problem with national polls that had Republicans as a solid majority was that there was not historical evidence that R’s were a self-identified majority party. Thus, the question of biased results. In Cali, D’s are the majority, unless I’m badly mistaken.
Dr. C, when I mentioned LA Times’ poll as an example of polling bias, I was specifically talking about the Schwarzenegger’s win over Bustamente. Just go over what LA Times predicted vs. other pollsters vs. actual results. LA Times had the Recall failing as well as Bustamene winning, and they were wrong in both counts by a large margin. Most other pollsters like the Field Poll and SurveyUSA were close to the actual results. And to refresh your memory, LA Times was vehemently opposed to Schwarzenegger and went after him with idiotic charges.
LA Times’ polling of Bush vs. Kerry also showed in a couple of instances of huge Democrat samples. If you read the ABC News’ Note, you would have seen the discussion there between LA Times’ pollster and Bush campaign’s pollster Matt Dowd over this issue. When you have more Democrats by something like 14%, there is something whacky about your sample and/or your method.
Now with respect to your second paragraph, I never said Republicans had a “solid majority.” What I implied is that polls should have them roughly equal, within reasonable margin of error. The election in 2000 and 2002 showed that the country is closely divided. Other polls by Pew and Gallup during 2004 showed that the registrations by Democrats and Republicans were roughly equal as well. So I don’t buy your argument that polls should show a lot more Democrats in the sample than Republicans as a norm. This type of fallacy is what caused Zogby (who fixed the ratio to 2004 election exit polls) to lose his reputation and credibility during 2004.
Cableguy, I’m thinking about these issues as well. You seem to be advocating for the weighting of samples by party ID in an effort to make them more “representative,” right? If so, I recommend Mark’s series on weighting by party ID.
In that series I found interesting data that demonstrates that responses to the “Party ID” question are notoriously unstable; certainly more unstable than real party ID as measured by changes party by registered voters. If that is the case, why would anyone weight their sample to such an unstable characteristic?
Just because a respondent tells you that s/he is a Democrat or a Republican, doesn’t mean that s/he is telling the truth. That is, I suspect that responses to party affiliation questions ebb and flow with respondent attitudes toward the party or the leaders of that party. That is, the expressed opinion (say job approval, or who they would vote for if the election were held today) probably drives responses to the party ID question.
That said, I do think that many polling firms introduce bias into their surveys and output. Biased wording and ordering effects of survey instruments can inadvertently “push” respondents into answers (I’m *not* saying they are Push Polls). Also, for pre-election polls, I’m a bit skeptical about some of the methodology used to select likely voters from the samples and am especially skeptical of how some firms allocate undecided voters.
Weighting by party ID is a tough issue and I still haven’t given it enough thought to come down firm on either side. Although, I’m leaning against the practice…
Rick, IMO I don’t think weighting by party ID makes sense. I am not an advocate for that. As you saw in 2004, Zogby got his reputation tarnished as a result. Some smarter pollsters like Wash Post did weight, but I believe that they used party ID breakdown from last few days instead of 2000 exit poll. And Battleground pollsters used equal weighting before allocating undecides.
The point I was making is that party ID numbers tell us whether that particular poll was done well or not. As you know, most polls have 95% confidence levels. So if you do a “random” poll, and you get results that are out of line, then you can throw it out or publish it stressing those observations.
For example, if Gallup does a poll and their sample contains 20% more Democrats, it is a good bet that either the sample was bad or there is something wrong with their methodology (e.g. weekend polling).
It is my contention (and many conservatives with whom I discuss these issues) that NY Times and LA Times polls have many more Democrats in their samples most of the time. I read about this in one of Dick Morris’ books, and have been keeping an eye on it ever since. Morris points out that in 1999-2000 almost every one of NYT’s polls were biased except the final one which had equal number of Republicans and Democrats, and that last poll produced a result in line with election results. Either it is a pure coincidence or NYT is manipulating the results. Dick Morris had other similar examples of NYT playing with the numbers.
If NYT and LAT were truly balanced, they should on average have roughly equal number of either party since 1999 or so. Go look up NYT’s poll and look up historical breakdown by party ID (usually provided on their website), and you will see a strong bias toward Democrats. If you are truly interested, email me and I can send you the link.
Very intersting Cableguy. For the record, I am a Republican ;-). Why not drop the link in the comments so followers of this thread can read as well?
How do you know that they oversample Democrats? Because their random sample drew a disproportional number of those who say they affiliate with the Dem party? Again, I suggest that you read MP’s series on weighting by Party ID. There is a lot in there about the party ID question that is useful to this discussion.
Pollsters: What Validates a Poll?
Mystery Pollster (hat tip Paul Brewer) has a wonderful post up about a Hotline survey of pollsters over what they look for in evaluating the quality of a sample. This is an absolute must-read for poll junkies.
Mark,
Excellent reporting, as always. Informative and useful. Kudos.
cableguy,
you are a sneaky one.
1. You specifically referred ONLY to the california poll. california is more D than R in terms of self-identification. Thus, one expects to find that in the sample. Thus, it was probably not biased, at least not in terms of PID distribution, expected and actual.
2. if you want to subsequently introduce national polling by LA Times that was not mentioned first, and then use that to critique my critique, well, that’s underhanded. I was not speaking about national polls and neither were you. Besides, other than your obviously biased perception that LA Times was out to get Arnold, do you have any data about whether LA Times polls were further off than others? If you don’t have the data to show LAT was worse (more pro Kerry) than others, on average, then there isn’t really much left to say. Obviously, greater sampling accuracy should result in greater predicitve accuracy, but to suggest prediction is impossible without a perfect match on what yu want from PID is not valid. Even if LAT was off, pro kerry, its not 100% clear that its because of skewed PID, although its most likely. To suggest intentional distortion of polling samples to produce desired results is pretty far out there. I suppose the long-built reputation of the polling outfits means nothing for a few percentage points kerry’s way? Right.
3. Since I was talking about California, I said D’s were more prevalent. You then switched that to imply I said nationally. In fact, I said nothing like that. In fact, D’s are more prevalent in california.
4. Nationally, there are a number of benchmarks you can use for PID: self-identified PID trends, voting behavior, turnout, etc. Based on past turnout and self-identified data, there are probably an equal number of reasons to say D’s or R’s should be slightly more in the sample. In the past, D’s were the majority, so expecting or demanding a 50/50 split every poll is absurd.
But it was not the LA TIMES that was consistently off yet plastered all over USA Today and CNN; try Gallup, which oversampled R’s repeatedly. They had Bush by double-digits with their registered voter data, even though this was there historically most unreliable data to use. So why no mention of Gallup in your post? I know, you listen only to Dowd. Here’s a tip: don’t listen to partisans. It isn’t clear why Gallup hyped their less reliable data, but they did. And I haven’t claimed they did it on purpose.
5. Weighting by pid is problematic. Measuring the validity of polls via pid distributions are also problematic. This ain’t a perfect science. But it isn’t any easier when your ideological/partisan leaning color and distort your analysis. You are only seeing part of the picture.