A few days ago John Kesich, a commenter on this site, complained about the way the term "exit polls" has been used to describe both projections and raw data. He had a point. Confusion over the term "exit poll" runs far deeper, as I have seen the modifiers "early" and "final" applied inconsistently to a wide variety of exit poll tabulations. Much of the confusion stems from the reluctance of the National Election Pool (NEP) to discuss the various tabulations they generated on Election Day. Since they will not comment, confusion about the terminology is inevitable.
I have a backlog of questions to cover about exit polls, but I want to start by reviewing NEP’s various tabulations and projections and suggesting some terms to identify them clearly.
First a review of the process: On November 2, 2004, the NEP conducted separate exit polls in all 50 states and the District of Columbia plus a separate, stand-alone "national" sampling. NEP instructed its interviewers to call in to report their results three times on Election Day (all times are local): At about noon, at about 3:00 p.m. and roughly an hour before the polls closed. NEP started releasing tabulations of the vote preference question for the national survey and for most of the battleground states on an hourly basis beginning at 1:00 p.m. Eastern Time. Here is what I know about the various tabulations and projections:
1) Early-afternoon unweighted tabulations – NEP released tabulations for many states between 1:00 and 3:00 p.m. Eastern Time on Election Day based on partial data that were completely raw and unweighted.
2) Late afternoon tabulations (weighted by turnout) – At about 3:00 p.m. (local time) interviewers obtain a hard counts of actual turnout from election officials for their covered precincts. NEP officials use this data to weight their late afternoon exit poll tabulations to match the actual turnout in the sampled precincts. Presumably, NEP started to deliver weighted data for eastern states at about 4:00 EST, but weighted data for states in western states may not have been available until 6 or 7 pm EST.
3) Just-before-poll-closing tabulations (weighted by turnout) – Interviewers call in roughly an hour before polls close with their final tabulations and another hard count of actual turnout. NEP uses these final reports to prepare exit poll tabulations for each state a few minutes before the polls close. The data are weighted by actual turnout. The weighting procedure assures that the mix of precints within each state — urban vs. rural, Democratic vs. Republican, etc. — matches the actual turnout patterns recorded that day. The network "decision desks" use these tabulations to "call" winners, but only in states where the leader’s margin far exceeds statistical significance (the tests of statistical significance assume "confidence levels" of at least 99%, not 95%).
NEP also generates cross-tabular tables (weighted by turnout) for each state just before the polls close. These are tables similar in format to those now available on CNN.com (though the results are now different) that show how respondents answered each question and the vote preference calculated across answers to each question. The cross-tab tables play no role in projecting winners. Rather, networks and newspapers use them to prepare "analytical" stories about the election.
4) Projections after the polls close – Once the polls close, NEP gathers actual results for the precincts sampled in the exit polls and also for another larger sample of precincts (typically referred to as a sample of "key precincts"). Since not all precinct data is available at once, NEP gradually combines the exit poll results and the actual vote counts into an evolving hybrid of projections and estimates that gradually improves over the course of the evening. Although the projection models and tabulations are reportedly quite elaborate, NEP and its forerunners have disclosed very little about them.
5) "Corrected" Exit Poll Tabulations – Once the actual results have been counted in the wee hours of election night, NEP re-weights the results of each exit poll so that the vote preference on the poll matches the actual count. They then release new cross-tabular tables for each state to the general public. In theory, weighting to match the vote preference to actual results makes the complete exit poll more accurate.
6) "Final" Tabulations? – The tabulations put out the day after the election may not be truly "final." I have heard rumors of additional revisions that either have occurred or are about to be released. For example, I heard about a week ago that NEP was about to "revise" its estimates for Hispanic voters. The national survey posted on CNN.com estimates that President Bush received 44% of the Hispanic vote, yet a story in Sunday’s Washington Post puts his Hispanic support at 42%. Does this difference reflect the rumored revision? I have no idea, but will report if I learn more.
7) Raw data deposited with the Roper Center – Consistent with past practices, NEP has promised to make a copy of its raw data available to scholars at the Roper Center archives, only on a more accelerated timetable than usual (about three months).
Those who are concerned about the discrepancy between the exit polls and the actual vote count should focus only on #3, the tabulations prepared just before poll closing for each state. Unfortunately, these were not officially released. The early releases (#1 & #2) were widely leaked and posted on the Internet, but had bigger discrepancies resulting from incomplete samples or the use of completely unweighted data. The later releases now available through official channels (#5) are not helpful for analysis of the discrepancy since they were "corrected" to conform to actual vote results. The only source of "just-before-poll-closing" results appears to be the data in the paper by Steven Freeman (more on this in the next post).
Let me anticipate a few pertinent questions:
Why so many different tabulations? As discussed here before, the exit polls serve at least three functions: (a) They help give producers and reporters a head start in preparing their election night broadcasts and stories, (b) they assist the networks in "calling" winners and (c) they provide a resource to help reporters and the general public interpret the results of the election. The mid-day tabulations and before-poll-closing cross tabs help provide a "head start" that gradually improves during the day. The before-poll-closing tabulations (#3) and the later estimates that incorporate actual votes (#4) facilitate official "projections." The various "corrected" releases (#5 to #7 above) – the only ones meant for wide release – serve the third function, providing an analytical tool for reporters, scholars and the general public.
Are "uncorrected" mid-day and before-poll-closing tabulations available from prior years? Generally, no. Again, these tabulations were never intended for public release. Leaked mid-day numbers are available only to the extent that the "leakees" saved them. Smatterings of before-poll-closing tabulations have appeared in journal articles but are otherwise unavailable through official sources. Moreover, I assume that researchers cannot easily replicate the "before-poll-closing" tabulations using the raw data available through the Roper Center – if they could, analysts like Ruy Teixeira would have run them rather than relying on raw, unweighted tabulations from past years.
Why will it take at least three months to release the raw data? The process of opening the raw data to scholars is slow because (presumably) the archivists need to format the data and prepare documentation so researchers can use it appropriately. Remember, we are talking about raw data from 150,000 interviews, 50 separate state exit polls, plus D.C. and the separate national survey, each with a slightly different questionnaire. The task of preparing it all is huge.
However, as John Kesich’s complaint implied, nothing prevents the immediate release of the various Election Day exit poll tabulations except NEP’s reluctance to do so. These results were disseminated in electronic documents to NEP members and subscribers on November 2 that were presumably saved somewhere. I am not sure what purpose continuing secrecy serves except to provide greater fuel to those spinning conspiracy theories.
Mark wrote:
“First a review of the process: On November 2, 2004, the NEP conducted separate exit polls in all 50 states and the District of Columbia plus a separate, stand-alone “national” sampling.”
“Why will it take at least three months to release the raw data? … Remember, we are talking about raw data from 150,000 interviews, 50 separate state exit polls, plus D.C. and the separate national survey, each with a slightly different questionnaire…”
I think you may have forgotten the “key precincts” survey in the list of surveys that are conducted and would need to be sorted out within the raw data. You mention “key precincts” in #4 but not elsewhere, so it is not clear if they are a separate group of precincts or are culled from within the national and state exit poll precincts.
Also, do key precincts play a role in projections for #3? You imply they do not, but there might be some confusion on this point.
Does the statement below mean pollsters do not interview voters during the last hour of voting? It would seem that this could affect early voter vs late voter issue a little.
“NEP instructed its interviewers to call in to report their results three times on Election Day (all times are local): At about noon, at about 3:00 p.m. and roughly an hour before the polls closed.”
Alex asked two questions…
1) “I think you may have forgotten the “key precincts” survey in the list of surveys that are conducted and would need to be sorted out within the raw data. You mention “key precincts” in #4 but not elsewhere, so it is not clear if they are a separate group of precincts or are culled from within the national and state exit poll precincts.”
Sorry for the confusion — this is, unfortunately, a complicated subject. The sample of key precincts is seperate and distinct from sampled precincts used for the exit polls. The reason for the extra sample is that it’s cheaper and easier to collect actual results on election night than to poll voters.
I do not have access to the raw data from past years, but I do not believe the “key precincts” data are included. Anyone with access to the Roper Center archives could check and report back.
2) “Also, do key precincts play a role in projections for #3? You imply they do not, but there might be some confusion on this point.”
Nope, none at all. Good question. They gather ONLY actual vote results for the key precincts, and these are not available until after the polls close.
Ho-Yon noticed that my description implied that exit pollsters stop interviewing voters during the last hour of voting: “It would seem that this could affect early voter vs late voter issue a little.”
Ho-Yon read me correctly. If voters in that last hour were more likely to favor one candidate or the other, it certainly could introduce some error. However, the final hour was not the only period left uncovered.
I heard via email from 3-4 interviewers in one “battleground” state who also sent along some of their training materials. In that state, at least, interviewers were told to stop their work after their final call, roughly an hour before the polls closed. So voters who exited the polls during the last hour would have been missed.
Interviewers also would have missed voters who exited while the interviewer was tabulating and calling in results at (roughly) noon and 3:00 p.m., and during any bathroom breaks.
In other posts you explain non-response weighting by gender, race, and age. Could you include how that weighting fits in with this 7 step process?
Thank you for clarifying that no non-random “key precincts” were exit polled. My concern was that non-random “key precincts” were exit polled, thus polluting the data from randomly selected national and state precincts.
BTW, I blame Ruy Texeira as the source of my confusion on key precincts:
http://www.tcf.org/publications/pow/nov17_2004.pdf
Ruy wrote on page 4:
“1. Samples are weighted to correct for oversampling of precincts (for example, exit polls have historically selected minority precincts in some states at higher rates than other precincts) and for nonresponse bias (exit poll interviewers try to keep track of refusers by sex, race, and age).”
If precincts are selectected randomly, I don’t think Ruy’s point holds.
Mystery Pollster – More on Exit Polls
Mystery Pollster has two great posts on exit polling although he doesn’t answer Hugh’s question regarding why no exit polling of Muslims…I’ve ordered the Roper Center data for the 2004 Iowa primary…I want to see what is in the data set before ordering …
I’d say that not talking to any voters starting an hour before the polls close SEVERELY degrades the value of exit polls WRT finding vote fraud. I can understand why the NEP people don’t care, but I think the “vote fraud!” screamers need to keep it in mind.
Mark:
If the NEP folks stopped polling an hour before the polls closed, then what proof do we have, if any, that the early exit polls missed late-voting Republicans?
An article came out tonight that the Texas Hispanic vote estimate has been revised from 59/41 Bush to 50/49 Kerry, which could explain the drop from 44 to 42 nationally for Bush.
http://story.news.yahoo.com/news?tmpl=story&cid=694&e=2&u=/ap/20041129/ap_on_el_pr/eln_texas_glance_corrective
Mark,
I’m very intrigued by how the final weighting is done to bring the exit poll results in line with the actual numbers. Is it nothing more than weighting by turnout in the sampled precincts compared to the “rest-of-the-world”, and then hoping that the results then line up? If that’s the case, then I’ll be much less skeptical about exit polls. But I suspect that the weighting schemes are more complex, and that the actual results must be factored into the weighting, too. If that’s the case I think the entire validity of the exit poll data is somewhat suspect. Here’s a toy problem example to illustrate the potential problem. Suppose the exit poll shows a 50/50 result, with a 50/50 M/F split for turnout, but a 60M/40F Bush gender split and a 40M/60F Kerry split. Now suppose the actual election result was 60/40 Bush over Kerry. If you weight the exit poll to recover the final result, you can not at the same time preserve all of the “internals.” If you preserve the M/F ratio (which I would think you ought to since there’s no chance for mis-registration error and also because it has better statistics than the other splits), you can’t also preserve both the B/K splits for each gender and the M/F splits for both Kerry and Bush. So your final weighting seems to account for systematic errors by skewing the internals (in this case, the gender-gap). So what good are such results to people doing post-election analysis?