Yesterday, the National Journal’s Hotline (subscription required) took up topic of great interest to MP: "Whether the polling community will admit it publicly or not," The Hotline‘s editors wrote on their front page, "there’s a crisis in their industry. From media pollsters to partisan pollsters, more and more consumers of these polls are expressing skepticism over the results, no matter how scientifically they are designed."
They went on to debut a new series of debates they are "hoping to spark" in the political community, the first on the topic of "Interactive Voice Response" (IVR) polling. They kicked it off with a long interview with Jay Leve, editor of SurveyUSA.
Now for those who are not familiar with it, The Hotline is a daily news summary that provides a comprehensive coverage of politics at the national state and congressional district level. Unfortunately, it is only available through a pricey subscription that is out of reach to most individual readers, so I cannot link to it directly.
However…the folks at the National Journal have kindly granted MP permission to reproduce the interview in full, as long as I cite it properly and include their copyright (and did I mention that this interview came from The Hotline, National Journal‘s Daily Briefing on Politics, a bargain at any price? Good…didn’t want to forget that).
Seriously, thanks to the Hotline for providing a forum for this important debate. Whether you believe that political survey research is "in crisis" or not, there is no question that the challenges already facing random sample telephone surveys will increase significantly over the next decade. Those challenges and the responses to them are worthy of debate, and not just among those who produce surveys but among our consumers as well.
I’ll chime in with some thoughts on Leve’s interview later today or tomorrow. Until then, here is the full interview. The comments section is open as always.
The following is an interview with SurveyUSA Editor Jay Leve. SurveyUSA
has come under a great deal of criticism over the years for not using real
people to conduct their polls, and use instead the recorded voice of a
professional announcer. However, in the ’04 election cycle, SurveyUSA had an
impressive track record which can be viewed on their web site. In the wake of these results, we decided
to ask the questions everyone wants to know the answers to and allow Leve to
address his critics head on. We plan to invite many pollsters to respond to this
interview; but any pollsters that would like to respond before we ask, email here.
You have
enjoyed a great deal of success in this past election cycle. How can you explain
your accuracy to your critics?
There are two ways to measure an
election pollster’s performance: "absolute" accuracy and "relative" accuracy.
SurveyUSA keeps track of both, maintains up-to-date scorecards and, alone among
pollsters, publishes the scorecards to our website for public inspection. By any
measure, SurveyUSA is at the top or near the top of election pollsters – not
just for 2004, but ever since SurveyUSA started polling in 1992.
Pollsters talk and write a lot about reducing "Total Survey Error," but most
obsess over the mathematical sources of error. I focus more on the how the
questions get written, who asks them, how the questions sound to the respondent
and how the questions get answered. SurveyUSA has re-thought from scratch
exactly how polls can best be conducted, given what professional voices make
possible. If the cost of each additional interview is expensive, which it is for
others, you think about TSE one way. If the cost of each additional interview is
relatively inexpensive, which it is for SurveyUSA, you can make other choices.
The amount of intellectual horsepower that gets applied to exactly how a
question is asked, and exactly what the respondent hears, as a percentage of
total expended intellectual energy, is greater at SurveyUSA than at other firms.
With this accuracy, what prevents any "Homer Simpson" from
purchasing an auto-dialer and conducting polls from home?
SurveyUSA has spent a fortune writing software and building hardware. But even
if I gave our technology away for free, to Homer or Pythagoras, they
would not know what to do with it. SurveyUSA’s technology is neutral. It’s just
a tool, neither good nor evil.
On your web site, you state that
"Many media polls are ordered and completed same day. Many market research
projects are ordered one day and delivered the next." How can this leave time to
accurately develop the questionnaire, ensuring that the questions are
unbiased?
SurveyUSA researchers do not start from scratch when a
poll is commissioned. Like most pollsters, SurveyUSA asks the same questions
over and again. SurveyUSA’s library has thousands of poll questions. Every so
often, something truly new comes up, and our writers must wrestle with
constructs, language, phrasing and the range of possible answer choices. In such
cases we may test multiple ways of asking the same question. Ultimately a
"keeper" goes into our library. Your question implies that questionnaires must
be long and complex. Not true. For others, who go into the field infrequently,
questionnaires take weeks to prepare, because both pollster and client know they
may not get another chance for 3 months. SurveyUSA goes into the field every
day. Our questionnaires are short, by design. Some see this as a limitation. We
see it as an advantage. The more questions you put in a questionnaire, the more
those questions interact with each other, and the more the early questions color
the answers to later questions. Others ask (ballpark) 100 questions, which take
20 minutes to answer. SurveyUSA asks (ballpark) 10 questions, which take 2
minutes to answer.
Do you feel the increased turnaround time has
any negative effects (if not the above mentioned)?
Every piece of
research has a proper field period. Some SurveyUSA polls are in the field for
minutes, some for weeks. SurveyUSA election polls are typically conducted over 3
consecutive days. Minutes after the first presidential debate last fall, ABC
News completed one poll of 531 debate watchers. CNN completed one poll of 615
debate watchers. CBS News completed one poll of 655 debate watchers. NBC News
did nothing. SurveyUSA completed 35 separate polls in 35 separate geographies,
of 14,872 debate watchers. NBC affiliates in Seattle, Salt Lake and Denver had
scientific SurveyUSA reaction in-hand minutes after the debate, while Tim
Russert and Chris Matthews pondered how many DailyKos bloggers had stuffed the
ballot box at the MSNBC website.
On the day before DIA opened in
Denver in 1995, SurveyUSA took a poll for KUSA-TV. We asked whether building the
new airport was good or bad. The next day, after the airport opened, we re-took
the same poll. Approval for the airport went up 20 points overnight. Should we
have held the story for 3 days so we could do more callbacks? We had news. Our
client led with it. The other stations had nothing. We owned the story.
How do you formulate a representative sample? Do you use random
digit dialing or voter lists? Which do you feel has the more accurate results?
Why?
SurveyUSA purchases RDD sample from Survey Sampling of
Fairfield CT. We have conducted side-by-side testing using RBS (Registration
Based Sample). In the testing we have done, RBS did not outperform RDD.
On your website, you state that in order to end up with an accurate
sample you use demographic breakdown to ensure you are portraying the
population. However, since the questionnaire is being asked of the first person
who answers the phone, how can you accurately establish a sample that is
appropriate for a poll? Even with screening questions to establish the
likelihood of a voter, how can you assure that a caller is actually over the age
of 18? Or for that matter, how can you assure that they are citizens, or
registered to vote? Is there any systematic way you can verify the accuracy of
this once a poll has been completed?
You’ve asked 5 questions
here, the first of which contains a false premise. SurveyUSA can choose to talk
to the person who answers the phone, or we can ask to speak to someone else.
There is nothing hard about that. By your question, you create the impression
that, a) SurveyUSA doesn’t understand the importance of selecting a respondent
from within a household and, b) even if we did understand it, our technology
prevents us from doing it. Both are false. SurveyUSA has read all of the
literature on intra-household selection, and SurveyUSA has done side-by-side
testing on the different ways that one might do intra-household selection. We
have tested the methods that are mathematically defensible in theory, such as
asking for the respondent with the most recent birthday (which has problems in
practice), and methods that are mathematically indefensible, such as asking for
the youngest male over the age of 18. Intra-household selection, in practice,
does not make the kind of polls that SurveyUSA conducts more accurate.
2.4 percent of those who take a SurveyUSA poll tell us they are under the age of
18. We exclude them. There is no evidence that people lie to us more often than
they lie to a headset operator. There is evidence to the contrary.
Some SurveyUSA competitors want you to think SurveyUSA gets an occasional
election right, the way Miss Cleo occasionally gets a psychic prediction
right. The facts are published and available for inspection. The odds that
chance alone can explain SurveyUSA’s success relative to other pollsters is
1,000,000,000:1, by many measures. To those who would like me at this point to
disclose that SurveyUSA got the Newark mayor’s election wrong in 2002, the San
Francisco mayor’s runoff wrong in 2003, and that SurveyUSA overstated
Dean in the 2004 Iowa caucuses, we did. When you have as many at-bats as
SurveyUSA, you are going to strike out from time to time. The question is: how
does our entire body of work stand-up? By multiple objective Mosteller measures,
SurveyUSA’s data need take a back seat to no one’s.
In 1999, a
subsidiary of the research firm IPSOS wanted to see if interactive voice was a
viable alternative to CATI. Senior IPSOS scientists put together a side-by-side
test with 93,000 interviews. The test was deliberately designed to isolate and
identify biases in interactive voice. As such, respondents were asked as diverse
a collection of questions as possible. The testing was designed, carried out and
paid for by the IPSOS subsidiary. After the 93,000 parallel interviews were
conducted, IPSOS wrote a white paper, summarizing the research-on-research.
Findings:
- "IVR produces samples that more closely mirror US demographics than does
CATI … Three demographics stand out as being the reason for these differences:
education, income and ethnicity. In all three cases, IVR was much closer to the
census than CATI."- "IVR interviewing generally succeeds on all three fronts: sample
projectability, accuracy and production rates. These findings suggest that IVR
is a valid method for administering short questionnaires to RDD samples."- "In the few cases where differences are noted in the data, some can be
resolved by the way we ask questions and some, we believe, are already more
accurate in IVR."After this white paper was written, this IPSOS
subsidiary began using SurveyUSA for data collection.
Due to the
manner in which you obtain your sample, is there a differential in accuracy in
general vs. primary elections?
SurveyUSA has polled on 310 general
candidate elections. Our average error on the candidate is 2.33 points.
SurveyUSA has polled on 167 primary elections. Our average error is 4.13 points
(1.8 times greater). We do not believe we are less accurate on primary elections
because of the way we obtain sample. Because no pollster has ever been asked
for, nor publicly made, this kind of disclosure before, I don’t know whether a
1.8 factor deterioration on primary polls is above average or below average.
Do you include "traps" in your screening process? If so, such as?
Do they prove to be effective?
We have experimented with as few as
3 and as many as 8 screens for likely voters over the years. In addition to
asking the obvious question, "Are you registered?", we have experimented with
many different variations on the direct, "How likely are you to vote" question,
including running side-by-side testing for many of our 2004 polls comparing a
4-point likely scale to a 5-point scale. We have, in past years, but not in
2004, asked people where they vote. In 2004 we asked respondents whether and how
they voted in 2000. We ask people their interest on a 1-to-10 scale. In 2004, we
used fewer screening questions than in past years. Our results were superior. We
find no simple relationship between the number of screening questions and the
accuracy of our results. When SurveyUSA consistently produces a candidate error
of 0.0 on pre-election polls, we’ll assume we have solved this riddle, and will
stop experimenting. Until then, it’s a work in progress.
Under what
circumstances are your polls more beneficial than traditional telephone polls as
conducted by Gallup? What makes automated polls more accurate?
Have you been to Gallup’s website lately? Have you watched Frank Newport
deliver the Daily Briefing? Have you been to the Gallup Brain? Have you read
Gallup’s blog? Do you receive the occasional introspective from David
Moore? What a tour de force. No other pollster is a close second to Gallup
in these areas. I aspire to run my company as openly and transparently as does
Gallup, and to provide interactive real-time access to our library of questions
and answers. In this regard, I have the highest respect for Gallup. Further,
Gallup has a 70-year track on many important questions, which gives Gallup a 60
year head-start on SurveyUSA. That said, I would not trade data with Gallup: 42%
of Gallup’s final statewide polls in 2004 produced a wrong winner (5 wrong
winners out of 12 state polls), compared to 3.4% of SurveyUSA’s final statewide
polls (2 wrong winners out of 58 state polls).
Professionally-voiced
polls are not inherently superior to headset-operator polls, and I do not make
that claim. I just rebut the assertion that professionally-voiced polls are
inherently inferior. Used properly, SurveyUSA methodology can have advantages.
In 1994, SurveyUSA polled California on Proposition 187 for TV stations in Los
Angeles, San Francisco and Sacramento. Prop 187 was a plan to deny benefits to
illegal immigrants. When others polled, some respondents heard the 187 question
this way, "Are you a bigot?" They answered in the politically correct way. "No,
I would never vote to deny benefits to illegal immigrants" (before going out and
doing just that). It did not matter how much confidentiality Field or LA
Times interviewers promised the respondent, or how well trained those
interviewers were. Both pollsters understated support for this measure. When
SurveyUSA polled 187, respondents did not have to confess anything, but rather,
had only to press a button on their phone, paralleling the experience the
respondent would later have in the voting booth, where no one speaks his/her
choice aloud. SurveyUSA said Prop 187 would pass 60% to 40%. It passed 59% to
41%.
If your only access to polling data is Hotline, you may
think Arnold Schwarzenegger scored a remarkable come-from-behind win in
the 2003 Gray Davis recall. The only polls Hotline reported showed
Cruz Bustamante ahead early in the campaign. What SurveyUSA knows is that
Cruz Bustamante never led in California. Californians may have been reluctant at
first to tell other pollsters that they planned to vote for the body builder,
but they had no problem telling KABC’s Marc Brown this every time
SurveyUSA was in the field, which was on 38 of the 59 nights of that campaign.
Publications, such as Hotline, which abide by the Gentleman’s Agreement
not to publish SurveyUSA polls, do a terrible disservice to their subscribers on
occasions such as this.
In 1998, I received a call at my house from a
well known Washington DC polling firm. The interviewer eventually zeroed-in on
questions about Bill Parcells, then the coach of the New York Jets, and a
Cadillac spokesman. I listened carefully. Why would the interviewer want to know
if I thought Bill Parcells was honest? Then I connected the dots. This was not a
poll about Bill Parcells, this was a poll about Bill Pascrell, who is my
Congressman, and who was running for re-election in New Jersey 8th District. The
interviewer was reading the name wrong. I said to the interviewer, "Ma’am,
excuse me. Stop. You are mispronouncing the gentleman’s last name. It is
Pas-crell. Not Par-cells." "No," she said. "It says right here, ‘Bill
Parcells’." How many times a day do you think something like that happens with
headset operators? How many different ways can you think of for an $8/hour
employee doing monotonous work to make a mistake? Does it matter how many PhDs
worked to draw the sample for that survey? Does it matter how many PhDs pored
over the data to write the analysis that the candidate ultimately was handed? It
doesn’t. The data was worthless. And this – importantly – was one of the best
outfits, an outfit that actually runs its own call center. Imagine how much
worse it gets at firms that just outsource their calls to a 3rd party, and who
have no direct control over who asks the questions.
Now, about the
word "automated." Almost all polling firms use purchased auto-dialers. The
dialer automatically dials the phone, detects a connection and, once the dialer
believes a human is on the line, automatically passes the call to an
interviewer. In some cases, that interviewer is well-trained and articulate,
sensitive without being intrusive, and in all things neutral. Perfect. But in
other cases, that interviewer is an unpaid, untrained college student hoping to
get a credit, or the interviewer is convicted criminal, calling from a call
center located within a Canadian prison. The people who staff call centers know
the dirty little secrets, and they know the kind of people they can attract to
do this work. They can tell you about interviewers who come to work drunk,
stoned, or hacking phlegm. They can tell you about interviewers who flirt with
the respondents, deliberately, to coax answers, interviewers who coach
respondents, leading them to the "right" answers, and interviewers who don’t ask
the questions at all, but who just make up the answers to save time. Not every
headset operator is horrible, to be sure, and the majority are well-meaning, but
every call center has horror stories.
In SurveyUSA’s case, when our
proprietary dialer detects a human, the respondent immediately hears the voice
of a TV news anchor. News anchors are not paid $8/hour. In some cases they are
paid $800 an hour. No one is more acutely aware of the limitations of SurveyUSA
methodology than I. But the choice is not between SurveyUSA and perfection. The
choice is between a news anchor, who has been on the air 30 years in some cases,
and a headset operator, who, if he/she lasts a year in the job, is exceptional.
I’ll take the news anchor. Were Winston Churchill alive, he might say:
"Many forms of data collection have been tried, and will be tried in this world
of sin and woe. No one pretends that using TV news anchors to ask the questions
is perfect or all-wise. Indeed, it has been said that using TV news anchors is
the worst form of data collection … except all those others that have been
tried from time to time."
Because of the nature of your polling
system, the types of questions that can be asked are limited. Without giving the
respondent an opportunity to choose "other" and then specify what that is on
more than one question, doesn’t this prevent the client from being privy to the
wants/needs of the sample?
"Other" can be included in any question
we ask. Structured probing can be done to whatever level is appropriate.
Unstructured, open-ended, iterative probing cannot be done, but if you want
unstructured, open-ended iterative probing, you need a focus group. In 1992,
SurveyUSA identified an opportunity to serve TV newsrooms that were not being
served by Gallup and Harris. We built a better mousetrap; the world beat a path
to our door. Just as the TVA brought water to small-town America, and the REA
brought electricity to small-town America, SurveyUSA brought true,
random-sample, extrapolatable opinion research to Wichita, Roanoke and Spokane.
Our clients are delighted with the work we do. Some have been customers for 12
years now. A number are under contract through 2008.
How do you
compare to Rasmussen? Do you feel you are more/less accurate when it comes to
competitive races?
SurveyUSA has competed with Scott William
Rasmussen on 68 occasions. We have outperformed Rasmussen using any of 8
academic measures. Our mean error and standard deviation on those 68 contests,
and Rasmussen’s, are posted to SurveyUSA’s website.
What is your
response to critics who state that while automated polls are fast, rendering
them headline worthy for TV stations, they are not accurate enough to use within
a campaign to determine strategy based on the reaction of the electorate to
issues or events? In addition to The Hotline, a number of other news
organizations have a policy of not running automated dialing polls, stating that
it would be a disservice to readers to portray the results as accurate — Roll
Call and the AP to name a couple. How would you convince us of otherwise?
Campaign managers scour SurveyUSA’s data, then make media-buy
decisions and change strategy. I know because campaign managers call me. They
tell me how eerily similar SurveyUSA’s data is to their own internal polling. By
any objective criteria or honest measure, SurveyUSA years ago earned the right
to be included in Hotline‘s "Poll Track." Yet we’re still blacklisted.
Evil triumphs when good men do nothing. Here’s a chance to do something© 2004 by National Journal Group Inc., 600 New Hampshire Avenue, NW, Washington
DC 20037. Any reproduction or retransmission, in whole or in part, is a
violation of federal law and is strictly prohibited without the consent of
National Journal. All rights reserved.
Good post. While I never trust samples of one, especially when that one is me, I have been polled both by headset people and computer. I am much more comfortable giving my answers to a computer.
Fran, I’ve never been polled – but then again, I’m only 28 and my 18-22 year old years I imbibed a tad too much and didn’t answer the phone (hey, that’s what an answering machine is for right?).
Dang it Mark, you made me spend 15 minutes of homework time absorbing every word of this post. Thanks for linking it for the poor slobs who can’t feed their kids (me of course, although my kids eat, I just go into debt).
One question though about the new questions that they develop. One of my professors has a small local polling company. He said that they always pre-test their survey instruments with focus groups before ever conducting the actual survey. Is that too cautious? Mark, does bpbresearch do that?
Excellent post. Thanks for doing it. AFter 2004, I have more confidence in these guys than anyone else except Mason Dixon.
Polls add clarity, confusion to Mayor’s race
With less than two weeks remaining until voters in Los Angeles go to the polls, the campaigns and media are seeking clarity to write their story lines. The internal and external polls appear sometimes vastly different, but the two latest…
Mano a mano
Now that the race to finish second in the mayoral primary seems to be a fight to the death between Hahn and Hertzberg, the latter’s manuevers to inoculate himself against negative hits are interesting to watch. You’ll remember, Hertzberg accused Hahn o…
Jay Leve is at best a snake oil salesman, and in the least a quack.