Unfortunately, my blogging time is short today but want to quickly pass on one bit of news (thanks to Rick Brady of Stones Cry Out for the tip): The so-called "raw" data from the National Election Pool exit polls are now available on-line through the Inter-University Consortium for Political and Social Research (ICPSR), based at the University of Michigan. (The same data are also due to be released by the Roper Center Archives, based at the University if Connecticut, within the next few weeks)
The files have been made available through ICPSR’s "Fast Track" service, which they describe as follows:
Studies on FastTrack are public use files that have not yet been fully processed by our staff. This system provides quick access to the study while the files undergo full processing. An announcement will be made as soon as the fully processed files are available.
The fast track link will lead to this FTP file directory which includes documentation from Edison-Mitofsky and sub-directories containing cross-tabulations and data files for the surveys in all 50 states, the District of Columbia and the national survey. Datafiles are in both ASCII and SPSS formats and include documentation to help identify and use included variables.
I have not had time to do more than skim the some of the documentation, but on first glance, the files appear to be consistent with the previous releases of respondent level exit poll data. The files include only one "weight" variable — the one that includes a "correction" to match results with the actual count. I also see no precinct level data nor any other means of replicating the "within precinct error (WPE)" analysis from the Edison-Mitofsky report. If I that turns out to be true, those who have been demanding the release of "raw data" are going to disappointed, to put it mildly.
Of course, I may have simply overlooked something obvious, or perhaps the "Fast Track" release is incomplete. So I would urge those who are interested and have more time on their hands today to post comments below on what is and is not included.
UPDATE: Rick Brady posts in the comments section an email response from Edison-Mitofsky:
"The Roper Center told us it would be about two weeks before everything is posted. They received the data over a week ago now, so it shouldn’t be too long. But they haven’t received anything different from what Michigan received, or from what they’ve received in the past from VNS."
UPDATE II (2/8): The NEP data is also now available from the Roper Center Archives. The Roper Center has prepared an exit poll CD available for free to its members and for $79 to the general public. In addition to the data available online from ICPSR, the Roper Center CD also includes comparable exit poll data from 2000 and crosstabs of the national exit polls from 1976 to 2000.
UPDATE III (2/8): I have been able to clarify what is and what is not included in the "raw data." Those who dive into the data files will find a field for "precinct," as Basil Valentine noted in the comments section. Although the data for each sampled precinct is designated by a code number, the precinct numbers in the data file do not correspond to the actual precinct number in any state. The data do no disclose the actual precincts sampled.
In response to an email query, an Edison Mitofsky spokesperson referred me to the following passage from the Code of Professional Ethics and Practices of the American Association for Public Opinion Research (AAPOR):
"Unless the respondent waives confidentiality for specified uses, we shall hold as privileged and confidential all information that might identify a respondent with his or her responses."
They feel that if they identify the polling locations it might be possible for a computer match to identify a small portion of actual individuals in the data. Some precincts are small enough that it would be possible to identify actual voters from their demographic data. They also feel that any effort to provide a precinct level estimate of actual vote or "within precinct error" would allow a user to identify the actual precinct and, theoretically at least, identify actual voters.
I will leave it to the reader to evaluate this rationale except to say this: The protection of respondent confidentiality is not some minor technicality. It is arguably one of survey research’s most important ethical bedrocks. No pollster or pollster and survey researchers should ever consider it a trifling matter.
Something else to consider: The U.S. Census has struggled with the issue of how to make "micro-data" available to the general public while still protecting respondent confidentiality as required (in the case of the Census) by federal law. A Census report on the history of confidentiality and privacy issues notes that the potential for disclosure of the identity of individual responses in publicly released data may result in any of the following measures (quoting verbatim from p. 22):
- Removal or reduction in detail of any variable considered likely to identify an especially small and visible population such as persons with high incomes.
- Introduction of "noise" (small amounts of variation) into selected data items.
- Use of data swapping (i.e., locating pairs of matching households in the database and swapping those households across geographic areas to add uncertainty for households with unique characteristics
- Replacement of a reported value by an average in which the average associated with a particular group may be assigned to all members of a group, or to the "middle" member (as in a moving average).
Yes, there are ways to release more data and still protect respondent confidentiality, but it is hard to imagine that anyone would find the deliberate "introduction of noise" or "data swapping" to be an acceptable strategy for the release of exit poll data.
Like it or not, the released data are all we are likely to see.
Thanks Mark. I sent a polite e-mail to Edison/Mitofsky asking if what we see at the UMich site is all that the Roper Center will make available. If it is, I suggested that they should REALLY consider being more open and release the data necessary to independently replicate the WPE analysis in the latest report.
The response from EMR:
“The Roper Center told us it would be about two weeks before everything is posted. They received the data over a week ago now, so it shouldn’t be too long. But they haven’t received anything different from what Michigan received, or from what they’ve received in the past from VNS.”
Looks like you were right Mark…Too bad…
At my first glance, it appears the precinct data is there, at least on the column guide for the national data.
from the file “US04Gcolguide.doc”
——————-
National Election Pool Exit Poll
Conducted by Edison Media Research/Mitofsky International
National Exit Poll
November 2, 2004
COLUMN
LOCATIONS DESCRIPTION CODES
1-7 Respondent ID number
8-9 State ID Number See general documentation for state list.
10-12 Precinct number
13 QTYPE
Questionnaire type. 1 = State questionnaire
2 = National questionnaire
——————-
And opening the US04GE.dat file, it would appear precinct is in fact located in columns 10-12.
Has anyone noticed that Kerry is first on all of the “Exit Poll” ballots? Isn’t this likely to get him more than his fair share of the votes in the “Exit Poll?”
Also, after looking at the data, there is additional information I want to see, including the number of actual voters in the “Exit-Poll” precinct, and more information about the Exit Pollster (not name, just some classification regarding age and source of recruitment).
Mark, this is a great update.
Is there any way for the EMR folks to at least publish their correlation/regression/ANY”Other” statistical analysis that demonstrates that the WPE table by voting method is not significant? Of course, it would be really nice to reproduce this analysis ourselves, but at least they could provide the p-values and tests applied…something…anything to substantiate their statement on page 40:
“WPE in precincts with any type of automated voting system is higher than the average error in paper ballot precincts. These errors are not necessarily a function of voting equipment. They appear to be a function of the equipment’s location and the voters’ responses to the exit poll at precincts that use this equipment.”
When they say “function” and “appear” I have to assume that they have run the appropriate tests to support these statements. The two tables (1) by equipment and 2) by size of “area”) are not good enough I’m afraid. I have to assume that there are more sophisticated tests available to analyze more fully these data.
Examining the “Unexplained Exit Poll Discrepancy”
A day after Edison/Mitofsky released their much anticipated report on the 2004 Presidential Election exit polls, the University of Pennsylvania issued a press release announcing that an “expert” on the presidential election exit poll errors has access …
Another thought after re-reading this post today. The Census makes the raw data available after 80 years (I believe). Maybe we can get EMR to commit to doing the same… I don’t feel like waiting 80 years though, so how about 50? 40?? 30???
I think their explanation is garbage and that is unfortunate because I have a lot of respect for their work.
How could releasing a spreadsheet that identifies the precinct (doesn’t have to give you the actual precinct ID – I’ll trust them), WPE, and all the other independent variables that go along with that precinct (vote tab method, characteristics of interviewer, completion rates, refusal rates, misses, etc.) reveal the identity of any voter?
As far as I know, and maybe I’m wrong, the overall Bush/Kerry WPE doesn’t give anyone any information about the race/age/sex of the voter… And, do the exit poll surveys ask these questions? As far as I know, and again – maybe I’m wrong, the only race/age/gender data collected are from the misses or refusals.
It seems that the focus of the controversy is over the WPE by vote method. Forget that the urban v. rural comparison makes the apparent discrepancy virtually vanish – let’s test it! Bring on the ANOVA. I cannot see how testing of this single hypothesis would require data that could allow someone to identify a specific voter. It doesn’t add up.
I’m beginning to wonder if they have been bought by Halliburton or something…(j/k)