NEP Data Available Online

Exit Polls Legacy blog posts

Unfortunately, my blogging time is short today but want to quickly pass on one bit of news (thanks to Rick Brady of Stones Cry Out for the tip):  The so-called "raw" data from the National Election Pool exit polls are now available on-line through the Inter-University Consortium for Political and Social Research (ICPSR), based at the University of Michigan.  (The same data are also due to be released by the Roper Center Archives, based at the University if Connecticut, within the next few weeks)

The files have been made available through ICPSR’s "Fast Track" service, which they describe as follows:

Studies on FastTrack are public use files that have not yet been fully processed by our staff. This system provides quick access to the study while the files undergo full processing. An announcement will be made as soon as the fully processed files are available.

The fast track link will lead to this FTP file directory which includes documentation from Edison-Mitofsky and sub-directories containing cross-tabulations and data files for the surveys in all 50 states, the District of Columbia and the national survey.  Datafiles are in both ASCII and SPSS formats and include documentation to help identify and use included variables.

I have not had time to do more than skim the some of the documentation, but on first glance, the files appear to be consistent with the previous releases of respondent level exit poll data.  The files include only one "weight" variable — the one that includes a "correction" to match results with the actual count.  I also see no precinct level data nor any other means of replicating the "within precinct error (WPE)" analysis from the Edison-Mitofsky report.  If I that turns out to be true, those who have been demanding the release of "raw data" are going to disappointed, to put it mildly.

Of course, I may have simply overlooked something obvious, or perhaps the "Fast Track" release is incomplete.  So I would urge those who are interested and have more time on their hands today to post comments below on what is and is not included. 

UPDATE:  Rick Brady posts in the comments section an email response from Edison-Mitofsky: 

"The Roper Center told us it would be about two weeks before everything is posted. They received the data over a week ago now, so it shouldn’t be too long. But they haven’t received anything different from what Michigan received, or from what they’ve received in the past from VNS."

UPDATE II (2/8):  The NEP data is also now available from the Roper Center Archives.  The Roper Center has prepared an exit poll CD available for free to its members and for $79 to the general public.  In addition to the data available online from ICPSR, the Roper Center CD also includes comparable exit poll data from 2000 and crosstabs of the national exit polls from 1976 to 2000. 

UPDATE III (2/8): I have been able to clarify what is and what is not included in the "raw data." Those who dive into the data files will find a field for "precinct," as Basil Valentine noted in the comments section.  Although the data for each sampled precinct is designated by a code number, the precinct numbers in the data file do not correspond to the actual precinct number in any state.  The data do no disclose the actual precincts sampled.

In response to an email query, an Edison Mitofsky spokesperson referred me to the following passage from the Code of Professional Ethics and Practices of the American Association for Public Opinion Research (AAPOR):

"Unless the respondent waives confidentiality for specified uses, we shall hold as privileged and confidential all information that might identify a respondent with his or her responses."

They feel that if they identify the polling locations it might be possible for a computer match to identify a small portion of actual individuals in the data.  Some precincts are small enough that it would be possible to identify actual voters from their demographic data.  They also feel that any effort to provide a precinct level estimate of actual vote or "within precinct error" would allow a user to identify the actual precinct and, theoretically at least, identify actual voters.

I will leave it to the reader to evaluate this rationale except to say this:  The protection of respondent confidentiality is not some minor technicality.  It is arguably one of survey research’s most important ethical bedrocks.  No pollster or pollster and survey researchers should ever consider it a trifling matter.   

Something else to consider:  The U.S. Census has struggled with the issue of how to make "micro-data" available to the general public while still protecting respondent confidentiality as required (in the case of the Census) by federal law.  A Census report on the history of confidentiality and privacy issues notes that the potential for disclosure of the identity of individual responses in publicly released data may result in any of the following measures (quoting verbatim from p. 22):

  • Removal or reduction in detail of any variable considered likely to identify an especially small and visible population such as persons with high incomes.
  • Introduction of "noise" (small amounts of variation) into selected data items.
  • Use of data swapping (i.e., locating pairs of matching households in the database and swapping those households across geographic areas to add uncertainty for households with unique characteristics
  • Replacement of a reported value by an average in which the average associated with a particular group may be assigned to all members of a group, or to the "middle" member (as in a moving average).

Yes, there are ways to release more data and still protect respondent confidentiality, but it is hard to imagine that anyone would find the deliberate "introduction of noise" or "data swapping" to be an acceptable strategy for the release of exit poll data.

Like it or not, the released data are all we are likely to see. 

Mark Blumenthal

Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.