Salon.com’s Farhad Manjoo has an excellent article, "More Fun With Exit Polls," out today on the continuing controversy that MP has been my focus since the election (though the article is part of Salon’s "premium" content, you can get a free "day pass" by watching a brief advertisement).
While Manjoo’s story breaks no new ground that we have not already covered here, it is a very clearly written, well-balanced summary. Those of you who simply cannot get enough of the exit poll issue will want to check it out.
I may also be burying the lead: Manjoo also quotes a certain blogger prominently. So Mom, you’ll definitely want to click here.
Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.
40 thoughts on “More Fun with Exit Polls”
Simon and Baiman continued…
There’s a nice write-up on the exit polls in Salon, which includes an interview of Mystery Pollster, Mark Blumenthal. Watch the ad and you can read the article for free.
Gday Mark – I havent really been following the exitpoll issue closely, but I read Farhad’s article and he said “At midday Eastern time, NEP interviewers had spoken to about 8,000 voters” which kind of piqued my interest (it seems that he made a mistake – he seems to be referring to the 3.59pm poll, not midday). In any case, I took a closer look and noticed the discrepancy between the 11,027 and 13,660 numbers which you and others have also noted. However, I wonder if you were a bit dismissive when you said “The missing 2,633 interviews, presumably coming mostly from states in the Midwest and West”.
The NEP issued a statement on Nov2 which appears to be their final summary at the end of the day about their activities for the day – it is still on http://www.exit-poll.net under the prominent link titled “National Exit Poll Methodolgy Statement” which links to http://www.exit-poll.net/election-night/MethodsStatementNationalFinal.pdf (note the word “final”).
This document specifically says “The National exit poll was conducted at a sample of 250 polling places among 11,719 Election Day voters representative of the United States. In addition, 500 absentee and/or early voters in 13 states were interviewed in a pre-election telephone poll.” That is, the NEP’s apparently “final” statement explicitly states that they only did 12,219 interviews.
As we know, by 1pm the following day, the NEP was reporting that they had actually conducted 13,660 interviews – 12% more than they had officially stated – which is starting to test the limits of credulity, given that these people are supposed to count things for a living.
One hypothesis is that there was much interview-stuffing on Nov3 to produce the requisite swing to Mr Bush. To the extent that the NEP’s ‘final’ statement is valid, then it would seem to render irrelevant all of the other discussions about chatty democrats and gender bias and the rest.
Separately, the consensus seems to be that the leaked numbers on Scoop are the legitimate NEP numbers – Mitofsky didn’t claim otherwise in that delightful exchange with Mickey Kaus. However, some of the numbers seem curious – more here if you are interested http://wotisitgood4.blogspot.com/2005/01/exit-stage-left.html
Luke, if I’m not mistaken, that NEP methods statement was circulated either with the first round of data, or prior to the election day. These were their targets based on historical turnout at sampled precincts; election turnout was higher than expected and therefore they had more samples than the statement said.
Also, they actually interviewed way more than 13,000 in that national poll. They probably interviewed about 2X that, but what is reported is only a sample of the actual polls. MP has dealt with these issues in previous posts, if I am not mistaken.
Rick,
The lower numbers were targets? Really,
thats a little hard to believe because it doesnt make sense to report numbers that are targets to the nearest person..one would expect target numbers to be rounded..but who knows..NEP seems willing to do anything..I dont trust them one bit.
thanks rick and brian – as i mentioned, i havent been following it very closely – so i’m interested in what others have to say. but i do agree with brian – the statement is very specific – to the final person – and their language is in the past tense (and hasnt been updated)
i dont know anything about interviewing or exitpolls – but i did note that they had 1500 interviewers – which comes out to about 8 interviews per interviewer which seems like a low number – perhaps rick is correct in stating that there was 2X, altho that seems kinda odd.
thanks rick and brian – as i mentioned, i havent been following it very closely – so i’m interested in what others have to say. but i do agree with brian – the statement is very specific – to the final person – and their language is in the past tense (and hasnt been updated)
i dont know anything about interviewing or exitpolls – but i did note that they had 1500 interviewers – which comes out to about 8 interviews per interviewer which seems like a low number – perhaps rick is correct in stating that there was 2X, altho that seems kinda odd.
i also posted at ruy’s to see if anyone there had any further insight http://www.emergingdemocraticmajorityweblog.com/cgi/dr/mt-comments.cgi?entry_id=1011
grrrr – i dont even know how to post properly. neither a pollster nor poster be.
Luke and Brian, first what I know…
Read Mitofsky and Edelman’s chapter in Presidential Polls and the News Media (1995) about the 1992 exits.
“Including the state polls, interviews were conducted with 177,000 voters on election day in 1,310 precincts….There was usually only one interviewer working at a polling place. Interviewers arrived before voting began and worked 50 minutes of each hour. During their 10-minute break, they did a hand tally of the number of respondents who said they voted for each candidate. Interviewers telephoned results of the hand tally and the individual responses to each question to a central site three times during the day. Before reading the individual responses, questionnaires were subsampled. The subsampling was controlled by the operator receiving the call at the central site. The subsample produced 62,024 questionnaires, of which 15,256 were from national questionnaires. The rest were from state exit polls” (pgs. 82-83).
I’ve communicated with Dan Merkle of ABC News who participated in the exit poll design. He confirmed to me that the subsample of the precinct interviews was followed again this year.
Now, what I don’t “know” is whether the NEP methods statement was written before, during, or after the election. It was my assumption based on my understanding of how previous exit poll samples were drawn (based on historical turnout estimates) and the fact that anyone who goes to CNN’s web-site knows that there were more interviews than were reported in the methods statement.
Isn’t the subsampling a way to save time on the phone on election day by not collecting the full voter profile, but still collecting voting data from the FULL sample?
In other words the subsampling only affects the voter profile data, but the vote tallies are from the full set of interviews conducted. Just want to make sure we all understand that the subsampling does not affect the predictions of Kerry/Bush %’s. Unless of course I’ve misunderstood Mark’s post on this matter.
thanks rick – although im less concerned about their methodolgy per se, and more interested in the fact that the NEP said that they conducted 12,219 interviews (however that is defined) and then the following day they apparently reported that they had conducted 12% more than that.
i dont know when the method statement was written either – but as Brian said, the level of specificity – to the final person – strongly suggests that it was written after the fact. Given that it was these last interviews which swung the result so dramaticallly, then there appears to be some serious unanswered questions.
If NEP’s statement that it only conducted 12,212 interviews is true, then they apparently/presumably lied and simply created the final 1448 interviews in the office on Nov3 in order to get the number they wanted. In my book, that is called fraud.
Alex,
If you look at the following .jpg of one of those “leaked” exit poll data (http://photos1.blogger.com/img/150/1911/640/11-2_7-30pm.jpg), you’ll notice that the crosstab for the whole proportion is based on the subsample.
I do not know if the NEP members saw different data than the subscribers (the graphic is reportedly leaked from a subscriber), or if the projected calls were made on the full tallies that were phoned in. My guess is that the projections were made based on the full tally (Why collect the data).
From MP’s early post on projections, I understand that projections of winners was hardly a factor here (i.e., battleground states weren’t “called” for a long long time).
Alex, you wrote:
“In other words the subsampling only affects the voter profile data,…”
The data that everyone (Freeman, Simon/Baiman) has been “analyzing” (I say that loosely) is based on the subsample. The data Edison/Mitofsky are analyzing include all the interviews per precinct.
This is one more reason why I’ve been saying (on my blog) that the data are too “fuzzy” for rigorous statistical analysis where probability calcs like 662,000:1 are dubious. These probability calcs are highly sensitive to slight adjustments to the data.
Luke, something else to consider. I’m pretty sure that ALL the state sample sizes were off as well (although not all were under-counted). E.g, in CA the statement said they interviewed 2,105, but the CNN web-site shows 2,390 interviews. For the Alaska poll, they showed 1,194 interviews, but the CNN website shows 1,177.
These discrepancies are simply too obvious for the NEP to be trying to pull a fast one on the public. That is why I continue to maintain that these were circulated with the election data. However, I’ll ask around…
Cheers Rick – i’ve only been looking at the ‘leaked’ Scoop data, which is by region. CBS is using the same 13,660 number, and they seem to have pulled their state data http://election.cbsnews.com/election2004/state/state_us.shtml currently returns a ‘Not Found’
I agree with you that it seems unlikely that they have pulled a fast one – yet there are obviously some outstanding questions with the data.
Interestingly, the % increases from the 7.33pm numbers to the Nov3 numbers are weighted toward the East and MidWest
East – 39%
MW – 31%
South – 13%
West – 20%
Again, I havent been following this issue closely – so i’m playing catchup and apologise if i ask questions that have been asked/resolved a thousand times. I dont know the slightest thing about polling – my experience is in industries where 12% mistakes are significant…
On a separate note – Farhad in Salon refers to midday polls – was that simply a factual error?
the nashua advocate points out other discrepancies in the size of the pool (disclaimer: i sent them an email) http://nashuaadvocate.blogspot.com/2005/01/news-election-2004-nep-adds-hundreds.html
They talk about ‘hundreds’ of additions, which is fine – but the reality is that its the percentages which are significant here – we are talking about a 12% discrepancy in the number of interviews, in an environment where 3% is apparently considered a mandate… and if the apparent swing to mr bush was already near impossible in the final sample of (apparent) interviews – it is definitely impossible if there was actually only 12,212 interviews.
theres also something odd in the fact that if you look at the difference between the 7.33pm numbers and the numbers from Nov3, most of the % gain is from the East – how can it be that the East gained 31%, and the West gained only 17% from that point in time – given the time difference? bad cell phone coverage?
Cell phone coverage? How would that play a role in exit polls?
Rick:
Thank you for explaining the subsampling found in Freeman’s data. I think that is actually a revelation, at least to me.
So, MOE’s are based on the subsample number of interviews? Would working with the full sample improve the MOE substantially?
This subsampling issue seems to add another significant layer to the matter of figuring out what went wrong with the exit polls. Or am I not understanding the implications correctly?
Thanks for any further clarification you or Mark can provide.
First things first. I got word from Jennifer Agiesta of Edison Media Research. She has been VERY kind to me over the past month in providing me with information. Re the discrepancies:
Rick,
The CNN web site is displaying the number of unweighted cases that are being used in the crosstab for the presidential vote.
The methodology statement includes the actual number of respondents who filled out the questionnaire.
These two numbers can differ for two technical reasons:
The first reason is that some respondents fill out the questionnaire but skip the presidential vote question. For example in Alabama the CNN site shows 736 respondents. The methodology statement show 740 respondents. This is because 4 respondents chose not to answer the question on how they voted for president and are not included in those crosstabs but they are still included in the data file because they may have filled out how they voted in other races that day such as the Senate race.
The second reason is that respondents from the national absentee/early voter telephone survey received all four versions of the national questionnaire while election day respondents only received one version of the national questionnaire. Thus, these respondents are included 4 times in the unweighted data (once for each version of the questionnaire) but their survey weights are adjusted down so that in the weighted data each national absentee/early voter telephone survey respondent only represents one person.
Again the methodology statements state the correct number of total respondents interviewed.
I hope that this explains the differences in how these numbers were reported.
Jennifer
——
There you have it. The telephone surveys were quadruple counted in the sample sizes reported on the CNN website. Now, the question to me seems to be: Why not weight these surveys, before sending them around to their clients? Doesn’t make sense to me, but I don’t think this is some conspiracy here.
Luke, as far as the regional questions go, I think MP might be the best one to answer this (and he may be trying to do so now). I have ordered the 2000 exit poll data from the Roper Center for some analysis and hopefully the data sets will include some explanation of how the geos were defined. Although, I’m not too hopeful there will be much of use here.
Alex, we don’t know much about cell phone coverage error. I’ve read articles that suggest its a big issue and other articles that suggest its not an issue. Also, we don’t know what rate of “misses” or “refusals” occurred with the phone calls, let alone cell phone calls. Perhaps MP has already, or would be willing to comment on the cell phone question. I think this is an open debate in the polling world right now.
Also Alex, re the subsampling procedure. I see three sampling stages here: 1) selection of a sample of precincts (cluster); 2) sample of voters; and 3) sample of the sample of the voters.
The way I see it, the exit poll data circulated on election day and made available to subscriber clients (the leaked Scoop PDFs) were based on a sample of a sample of a cluster sample.
What does this do to the MoE? Well, nothing to the MoE if you consider the table that WAS circulated with the data on election day to subscribers (I’ve confirmed this much). If we look at those tables, there is a range of sample sizes and the clients were supposed to look at the sample size and determine the corresponding MoE. This a VERY rough guide, but according to Mitofsky (e-mail to me), it was developed based on a calculation of the design effect according to the poll design (Merkle and Edelman participated in this calculation).
Therefore, the data Edison/Mitofsky are analyzing as we speak is the full set of data. In fact, the data that Edison/Mitofsky may have used to “project” winners on election night, may have been the full tabs “corrected” for non-responses . (They “corrected” for non-responses in the 2004 primaries anyhow – the question is – what if what you “know” about the non-responses because of age, sex, and race is not “true”?).
——
Bottom line the way I see it. The data and methods are too fuzzy to make any hard statistical analysis. That means, when folks like Freeman and Simon/Baiman throw around VERY precise calculations of odds and then base conclusions on those odds, they either: 1) are trying to BS everyone; or 2) they really don’t know what they are doing. Look; anyone who glances at the data knows it is skewed. It doesn’t take a PhD…I mean…an excel spreadsheet and a 2000 analysis of the 1996 exit polls to figure that out.
The NEP holds the cards here. They have the data in the form that they need to analyze it correctly. These data will eventually be made public (via Roper Center).
Sure, some will say that these data are not trustworthy and may have been “cleansed” by the NEP (why? I don’t know), but I won’t subscribe to that theory. Edison/Mitofsky owes it to their clients to analyze the data before the likes of Freeman and Simon/Baiman write severely flawed reports.
I think their distinguished careers have earned them at least this courtesy.
One point that should be made for the benefit of foreign readers and those who were asleep during their Civics class is that the U.S. Electoral College system can forgive more sins than the Pope.
There is nothing about “democracy” in it, and so there is no possibility of tainted results propagating past the electoral college. All the math required to tabulate it can be doubled checked by anyone with a second grade education.
In the last analysis, democracy in America depends not on exit polls or even the popular vote, but on the willingness of alleged democratic states to certify their electoral votes, and the willingness of Congress to seat them. Disputes are not resolved by exit polls or re-votes or recalls or even courts, but in the last analysis by Congress.
In order to have a “failed election” similar to the one alleged in the Ukraine, under the design of the U.S. Constitution, one would have to subvert 26 out of 50 democracies–and if successful, need I add the country would already be in far greater trouble than
Perhaps sites like this would do a public service if, among the endless talk about exit polls and voting, they pointed out that the political design of the U.S. is precisely to allow inherently uncertain processes to gain net certainty, in defiance of statistics, information theory, and electoral democracy–by incorporating just a few people and some things called “Constitutions” and “Laws” in the aggregation stage. This takes out the mathematics–even to the extent of legitimating fraud–on *purpose*.
In summary: the outcome of U.S. national elections is certain and legitimate after the legal and congressional process has run its course, almost no matter *what* happens.
The legitimacy has *nothing* to do with democratic voting or elections, by design, and by excellent design–and for precisely forseeing the reasons that we cannot agree on here, namely that if we take people out of the process, and put in mathematics or machines, we no longer have a political process but a noise generator.
That means: if you feel the democratic process has been subverted, your task is to reform all 50 *states* or whichever ones you don’t like. If Ohio is not a democracy, that is a problem for Ohioans. We, as a nation, have a political system that can survive carpetbagging in all the former states of the Confederacy.
Remember this the next time someone wants you to overthow the U.S. electoral system. We are not a democracy, at the national level, so we can be ruled democratically.
Hi Mark
I still haven’t gotten any responce to the question I posted a couple of days ago. But it goes a long with this topic to so I’ll point it out again. Can you explain this http://www.cnn.com/ELECTION/2004/pages/results/states/US/P/00/epolls.0.html
The smoking gun is the final exit polls that CNN and others have up at this time.
These are the numbers (13,660) put up to dispel the “problems” with the early numbers and the ones they have used to tell us what we think. They are OBVIOUSLY misleading/wrong (see web page above and below). The PROOF is about 2/3’s maybe 3/4’s of the way down, the question is who did you vote for in 2000. The problem is obvious 37% Gore 43% Bush in 2000 that is WRONG. Again the people that did the polling know that’s not right, the people that posted it know it’s not right. They didn’t do any fancy weighting to adjust for that, as I’ll show below. It’s that simple. It’s more obvious than the type setting on the font of an old document. Anyone with a statistical background would know it is not correct. The only thing I wonder about is how that data got in the stats at all. Maybe someone on the inside is trying to say something?? I don’t know?
Do you understand how weighted averages work?? The sum of the fraction times the amount
Numbers from the data (see again see web poll data below go to the question who did you vote for in 2000). The 51/49% Bush win is what we are being told. If you plug in the straight numbers the data comes out as they what you to believe. This works with other number too so that shows no other factor are included:
.17*45+.37*10+.43*91+.03*21=51.11% Bush
.17*54+.37*90+.43*9+.03*71=48.48% Kerry
With equal amounts of Gore and Bush AS WE KNOW?? IT WAS.
.17*54+.40*90+.40*9+.03*71=50.91% Kerry
.17*45+.40*10+.40*91+.03*21=48.68% Bush
Again, any one who puts out a poll that doesn’t have equal amounts of Gore to Bush has to know the data collection is faulty, and that any information gleamed from it is also wrong. Also if 17% of the people that didn’t vote last time went 54% Kerry and 45% Bush and that is in a Bush favoring sample. The switch Bush to Kerry and Gore to Bush are about equal, looking at that there is really no way to say that Bush won.
But that is not my point the point is that CNN and others are putting up data that is obviously wrong! Why is not really important to the point. How is not really either. The point is the data is wrong and it is still up. Any explaination?
Rick,
Hi! This is the News Editor of the Nashua Advocate. I’ve been following the discussion above with great interest, and the main problem I have with what I’ve heard — despite not being one, generally speaking, for conspiracy theories — is that we already *know* Mitofsky/Edison Media will artificially add and/or subtract “phantom voters” to their published sample size.
How do we know this?
Because they did it with the Ohio state-level exit poll, and have *admitted* to doing it.
Recall: the third of the three Election Day exit polls from Ohio — valid until the early morning of November 3rd, 2004 — showed 1,963 respondents. Mitofsky admits that the data was then “adjusted” to reflect raw vote totals. How was that done? By *subtracting* 31 Kerry “voters” and *adding* 88 Bush “voters” (the new poll had 2,020 voters as its sample size, but the new percentages for each candidate showed that no “new” voters had been queried — the existing tallies had simply been manipulated or, as Mitofsky says in his infuriating Orwell-speak, “adjusted”).
Now, what does Mitofsky’s willingness to add “phantom voters” to his sample size do to public confidence in his exit-polling?
Right now, The Nashua Advocate is reporting *four separate* allegedly “final” sample sizes for the National Exit Poll of November 2nd. How in the world is the public to distinguish between the justification Edison Media gave you for this discrepancy — which indeed sounds entirely innocent — and the entirely non-innocent fact that, in the Ohio poll, the sample size was changed using “phantom voters” who didn’t really exist and had never been queried as to their preferences?
So, when the N.E.P. releases its data in the coming weeks, how do we know if it was “adjusted” or not? And given that Mitofsky — in an e-mail to MSNBC reported on by Keith Olbermann — has expressed quite clearly that he thinks he can claim his polls are “accurate” based on *adjusted totals*, why should I or anyone else feel confidence that Mitofsky understands the difference to the American people between releasing adjusted and non-adjusted poll data?
What’s to stop him from releasing adjusted data and saying this is the “right” data, and the public has no right to see any other data because it isn’t the “right” (read: tautologically correct, by filtering in “real-world” vote tallies) data?
Thanks for any insights you can provide.
— The News Editor
Rolf,
It is a well known fact that when polled after an election people lie about who they voted for. (Dont ask me why…its absolutely bizarre and a cowardly thing to do…but apparantly they do) A few people who voted for the loser will say they voted for the winner…people just like winners. Or maybe people who didnt vote…lie a say they voted for the winner.
More here… real quick… then back to feeding the kids…
More on the subject of exit poll methodology from Merkle and Edelman (2002), “Nonresponse in Exit Polls: A Comprehensive Analysis” from Survey Nonresponse, Eds Groves, et. al. Pp 243-257.
“The VNS exit polls are conducted using a two-stage sampling design. In the first stage, a stratified, systematic sample of precincts is selected in each state, proportionate to the number of votes cast in a previous election. In the second stage, interviewers systematically select voters exiting polling places on Election Day, using a sampling interval based on the expected turnout at that precinct. The interval is computed so that approcimately 100 interviews are completed in each precinct…
“Interviewers are instructed to work with a polling place official to deermine the best place to stand. Ideally, the interviewers are located inside the polling place where all voters must pass them. Unfortunately, sometimes interviewers must stand outside, some distance from the door…
“Interviewers take a 10-minute break from interviewing each hour to tally the responses to the vote questions and their observations of the nonrespondents. Interviewers call VNS three times during the day to report their data. In local time, the first call is around 9:00AM, the second around 3:00 PM, and the last call shortly before poll closing. During each call, interviewers report their vote and nonresponse tallies and read in the question-by-question responses from a subsample of the questionnaires…”
John – others think you are an enigma (and sometimes I do as well), but welcome back! It’s been a while.
Nashua Advocate News Editor, if you could provide the sources and links to the quotes from Mitofsky, I’d like to check it out.
BTW – I HIGHLY doubt that Edison/Mitofsky will try to rely on sampling error to explain the discrepancy. They will partially fall on the sword (training, etc) and partially carve it up to differential non-response.
Remember, there are about 77,000 additional surveys out there that they have access to and we do not. Who knows how this will affect the comparative analysis?
“Adjusting” the data to ensure a valid sample is standard stats and is REQUIRED. The subject of weighting is often where pollsters and the public often go awry. Some weighting is absolutely necessary and routine, while other types of weighting involves “educated guesses” (but is also routine).
For example, if they weighted the non-responses according to what they “know” about the demographics (they can guess the age, sex, and race of the nonrespondent), then this is an example of an “educated guess.” They take what they “know” about these demos and they adjust their sample accordingly. If for some reason what they “know” is uniformly different from what “is true”, then the poll will have bias.
The Nashua Advocate News Editor and Rolf, I have a strange feeling that MP will have more on these subjects… I’ve shared about all I know.
Brian Dudley, when I hear things like “It is a well known fact” without some type of source, I start to twitch. Especially, if I don’t know that fact. 🙂
Nashua editor, Rick:
Do I understand things correctly, Nashua seems to describe Mitofsky adjusting the SUBSAMPLE from 1963 to 2020?
If it is the subsample that is being adjusted, that brings up two points:
1. So, is the adjustment Nashua points out what we call weighting to match the election results? So the reweighting involves adjustments to the subsample? That doesn’t sound like reweighting as I understand it. Or is this an additional step separate from the reweighting to match the election results? Boy this is started to get conceptually complicated on top of the statistics. Help.
2. Why wouldn’t Mitofsky incorporate his full data sample and disgard the subsample as soon as possible? It looks like his subscribers are still unnecessarily limited to his subsample based data.
Please note that if my questions indicate my misunderstanding of the issues, I apologize.
Alex, that appears to be the claim. The full sample would be more like 4,000.
re: #1. I don’t know anything more than what was described to me by the NEP. Without more information (like sources of the claims) from Nashua, who knows?
#2. I suspect that by the time the NEP had the full dataset (whenever the interviewers faxed or mailed them in), the election was over and they automatically switch into the “understanding the outcome” mode (i.e., predictions are over, now let’s weight to election result and see what it tells us).
Who knows…Maybe historically the differences between the full sample and the subsample aren’t statistically significant? Therefore, why bother?
Alex – my question re the cell phone coverage was supposed to be part tongue-in-cheek – i thought that the excuse may have have been that ‘we couldn’t call the results through’
here are the interview numbers (the times are Eastern, apologies for the formatting):
4pm 733pm 3-Nov
East 1746 2077 2888
West 1747 2203 2640
MW 2069 2804 3676
South 2787 3943 4456
Somehow, the East gained 39% more interviews *after* 7.30pm, and the West only gained another 20%, despite the time difference.
Luke, hang in there… I think I hear MP typing…
Rick,
You can find some of the comments here: http://64.233.161.104/search?q=cache:IvRJMQdJxYwJ:www.slate.com/id/2111831/+Mitofsky+accurate+%22exit+poll%22&hl=en
and some here: http://www.msnbc.msn.com/id/6533008/#041124a
There are others and I’m still looking for them. One is definitely to be found in a Democratic Underground thread, which I’m trying to track down.
As I understand it, what was done in Ohio was *not* “weighting” — because the sub-group percentages (e.g. men/women) did not change between the 1,963-voter sample and the 2,020-voter sample: just the number of “voters” casting a preference for each candidate. As I understand it, this is not weighting. There’s another word for it — filtering in raw-vote information — in polling terminology, and perhaps you know it; I did once but I’ve forgotten it.
More as I find it.
— The News Editor
Rick,
Back with more.
The following is from an interview Mitofsky did with Mayflower Hill (this is the article author speaking):
“I’ll add, though its somewhat public knowledge at this point, that Warren agrees with the conventional wisdom explaining how certain bloggers reached the wrong conclusions. The data that was reported on election day had not been ‘weighted’ for turnout yet. Once an accurate projection of overall voter turnout is made, the raw data that the exit pollsters collect is plugged into a complicated methodological system that I won’t begin to pretend to understand. The point is, though, that a sort of ‘correction’ is made to the raw numbers that everyone saw on Wonkette and other sites. The bloggers who ran those numbers either didn’t know about the system of ‘weighting’ the exit polling data, or didn’t bother to point it out.”
Find the whole interview here: http://mayflowerhill.blogspot.com/2004/11/mayflower-hill-exclusive-warren.html
The info the “bloggers” had on Ohio on Election Day was the 1,963-voter sample. That was removed from CNN’s website sometime between 12:30 A.M. and 2:15 A.M. on 11/3/04. It was replaced with a 2,020-voter sample which is now the “final” Ohio exit poll. The first sample shows Kerry winning 51/48, the second shows Bush winning 51/48. Indeed, the second sample matches the final Ohio vote total to within 0.1%, yet is not (using pure statistics) statistically possible — i.e., its result cannot be reached, or even approximated, by adding 57 voters (even were they all Bush voters, a statistical impossibility) to the 1,963-voter sample. In fact, the number is reached by removing 31 Kerry voters and adding 88 Bush voters. However, the screenshots of the two samples do not seem to indicate any substantial alteration in any sub-group breakdowns. Coupling this with Mitofsky’s acknowledged use of actual raw-vote data on Election Day (which I recognize is itself standard practice), the conclusion is clear: if the “raw-vote” data being used/filtered in was not related to the sub-group breakdowns, it could only have been — then — related to the voters’ preference for President. Meaning, voters were added and subtracted from the 1,963-voter sample purely with a design to make the presidential-vote query match (in the 2,020-voter sample) the “raw-vote” totals then seen in Ohio.
Which they did.
Accurate to 0.1%.
Without any evidence that “turnout demographics” changed substantially between the two samples (indeed, my research suggests that if there was any shift, it was in Kerry’s favor — as the final sample had more [I believe it was] non-white women than the 1,963-voter sample).
— The News Editor
1) My new post on the main page takes up the dicussion on Luke’s question and Rick’s email
2) I’ll try to get to the issue of the regional interview counts tomorrow
3) Nashua Advocate: What you are describing is the weighting procedure explained here many times that “corrects” the exit poll to match the count. It does not add or subtract respondents, although it may appear that way. You may want to review this post, which describes theprocedure and the timing: http://www.mysterypollster.com/main/2004/11/the_difference_.html
You may also want to check my FAQ on exit polls: http://www.mysterypollster.com/main/2004/11/faq_questions_a.html
If that doesn’t help, please email me..
News Editor:
“and the entirely non-innocent fact that, in the Ohio poll, the sample size was changed using “phantom voters” who didn’t really exist and had never been queried as to their preferences?”
I don’t follow why you think the increase in sample size implies ‘phantom voters.’ Why couldn’t they just create a larger sample size from their full set of data, which is not to say that would be a legitimate procedure, who knows.
But this brings me back to my other point. Mitofsky has the time to adjust sample sizes, appears willing to so in the case of Ohio (for whatever reason and assuming News Editor is accurate), then why not adjust the sample size to the FULL sample size?
Note to Rick, it seems to me the full sample would be most helpful during the “understanding the outcome” mode, considering all the small demogroups they have to worry about.
So one more academic question to throw on the pile. Actually now that I think about, my questions only stem from the fact that maybe Mark sort of short changed us in explaining subsample adjustments in his post detailing the various types of exit poll data. Apologies to Mark if that’s not the case at all. Once he explains it (again?) in his authoritative style it will probably make sense.
Luke:
“Given that it was these last interviews which swung the result so dramaticallly, then there appears to be some serious unanswered questions.”
Are you saying that the last interviews where what adjusted the exit polls to show to match the election results? Thanks.
To clarify my question. Exit poll data goes through a variety of stages including explicit weighting to match election results. So, that the exit polls matched the election results after awhile doesn’t directly imply that the increase in sample size is to blame, since the standard weighting procedure could also be to “blame.”
News editor:
I just reread this thread, and withdraw my comment about your phantom voters. I now understand what your trying to say.
Besides my “response” was just plain wrong.
Sorry for any confusion. I won’t post again until I ‘understand’ this thread.
alex – im not an expert in any of these matters – but the presumption is that if kerry *appeared* to be way ahead at 7.30pm, and then the final results showed that not to be true, then at least simplistically, the presumption is that the last bunch of interviews were ‘bush-heavy’.
i do have a lot of edu and professional experience with numbers tho – and i agree that most people dont have a clue how to deal with numbers – which is why i havent tried to delve into the detail of other peoples’ numbers – other than the highest level where it appears that someone says ‘we did x interviews’ which is apparently contradicted by their own statements. Mark may have solved some of these curiosities for us in his latest post. im also well aware that many who deal with numbers ‘professionally’ really dont really have a clue about how to use numbers either.
there are others who have suggested that the exitpolls have been massaged to converge with the ‘actual’ result – but i cant comment on that – i havent been following it closely enough.
it is curious that the NEP hasnt released their data already – ive seen quicker responses from the public service.
Nashua,
One thing I didn’t notice on my first read of this last night…
“However, the screenshots of the two samples do not seem to indicate any substantial alteration in any sub-group breakdowns.”
Do you have pre 12:30 screenshots for Ohio that show subgroup break downs? If so, I would love to see them.
MB
oops – i just reread my post from late last night “Mark may have solved some of these curiosities for us in his latest post. im also well aware that many who deal with numbers ‘professionally’ really dont really have a clue about how to use numbers either.” These two sentences were supposed to be independent and unrelated, but they quite dont read like that. The 2nd sentence was supposed to be a general comment, and was not intended to impugn MP or anyone else here. Full and unreserved apologies if this was misconstrued.
“Do you have pre 12:30 screenshots for Ohio that show subgroup break downs?”
I’ve asked Freeman to make these available (assuming he has them from Simon), but maybe he didn’t read my e-mail. Perhaps we should hound Simon?
Luke,
Do you now understand the reweghting procedure that Mark explains here: http://www.mysterypollster.com/main/2004/11/the_difference_.html
If I understand your comments, you believe that changes to the exit poll results available at different times would be solely due to additional surveys. The above post explains that reweighting, of different types but most importantly to match the election results once they start coming in, is another cause of changes to exit polls over time.
Before this raises any redflags, please read the post to know this is standard and that “pure” exit polls are also available, sort of.
Hope that helps.
cheers alex – yeah, i appreciate that there is some re-weighting that occurs, altho i havent tried to understand the mechanics – so i shouldnt really speculate. a little bit of knowledge is a dangerous thing…
It appears to be true that the number of interviews is an absolute though, right, and the weightings are internal to this?
Mark says “Presumably, NEP started to deliver weighted data for eastern states at about 4:00 EST, but weighted data for states in western states may not have been available until 6 or 7 pm EST.” and “NEP uses these final reports to prepare exit poll tabulations for each state a few minutes before the polls close. The data are weighted by actual turnout.” So it seems that we can begin to assume that the 7.33pm reports are largely already weighted – definitely in the EST states, and at least partially weighted in the other times zones?
And I’m still trying to understand how (what appears to be) the absolute size of the East poll grew by 39% *after* the 7.33pm report – given that the final numbers were supposed to be called in ‘an hour before the polls closed’ – or have i misinterpreted this somehow?
cheers
Luke wrote:
“One hypothesis is that there was much interview-stuffing on Nov3 to produce the requisite swing to Mr Bush. To the extent that the NEP’s ‘final’ statement is valid, then it would seem to render irrelevant all of the other discussions about chatty democrats and gender bias and the rest.”
AND
“alex – im not an expert in any of these matters – but the presumption is that if kerry *appeared* to be way ahead at 7.30pm, and then the final results showed that not to be true, then at least simplistically, the presumption is that the last bunch of interviews were ‘bush-heavy’.”
The reweighting speaks to your comments above. I haven’t followed, ney understood, this thread sufficiently to know if you still hold this position, but my point is your misunderstanding of the difference between “final” results and the 7:30 PM results.
What you consider the “final” results are exit polls that have been reweighted to match the election results. The 7:30 PM results are not already weighted to match the election results. The have experienced a second type of weighting to account for turnout and non-response profile, but not election results. So, independent of sample size changes, the change from the 7:30 PM results to the “final” is mostly, if not entirely, due to reweighting to match the election results.
This is not to say the changes in sample size aren’t interesting, but I haven’t read Mark Blumenthal argue these changes could result in a significant shift of the exit poll results. It seems to me the sample size debate is practically independent of the exit poll discepancy debate. Unless I have missed something. Hard to keep up with such a technical and somewhat muddied issue.
alex – thanks for your thoughts – and you are half right!
firstly, let me clarify something – your latest post quoted me twice, and in both quotes i used the word ‘final’ – although i was referring to different ‘final’ reports.in the first quote (“To the extent that the NEP’s ‘final’ statement is valid”) i was referring to the Nov2 document:MethodsStatementNationalFinal.pdf – which is where they said that they had conducted 12,212 interviews, and then the next day their numbers appeared to say that they had conducted 13,660 interviews. We’ve since learnt that the 13,660 number was actually incorrect – but at the time my statement was (seemingly) reasonable, as it appeared as though they had ‘created’ 1500 interviews the following day.
As to my second quote, yep, it demonstrated my ignorance, and i wouldnt write it again today 🙂
two things:
a) my sense is that the 7.30 results actually had not been *as fully* reweighted for turnout and non-response as you seem to think (because we were still missing a full 23% of interviews) – although i’m just assuming.
b) itd be lovely (drool) to get the nov3 report of the full 12212 interviews before it was ‘tainted’ with the actuals
Finally, your point about the independence of the two debates is spot on – and most of the discussion now is on the sample size issue.
cheers
Simon and Baiman continued…
There’s a nice write-up on the exit polls in Salon, which includes an interview of Mystery Pollster, Mark Blumenthal. Watch the ad and you can read the article for free.
Gday Mark – I havent really been following the exitpoll issue closely, but I read Farhad’s article and he said “At midday Eastern time, NEP interviewers had spoken to about 8,000 voters” which kind of piqued my interest (it seems that he made a mistake – he seems to be referring to the 3.59pm poll, not midday). In any case, I took a closer look and noticed the discrepancy between the 11,027 and 13,660 numbers which you and others have also noted. However, I wonder if you were a bit dismissive when you said “The missing 2,633 interviews, presumably coming mostly from states in the Midwest and West”.
The NEP issued a statement on Nov2 which appears to be their final summary at the end of the day about their activities for the day – it is still on http://www.exit-poll.net under the prominent link titled “National Exit Poll Methodolgy Statement” which links to http://www.exit-poll.net/election-night/MethodsStatementNationalFinal.pdf (note the word “final”).
This document specifically says “The National exit poll was conducted at a sample of 250 polling places among 11,719 Election Day voters representative of the United States. In addition, 500 absentee and/or early voters in 13 states were interviewed in a pre-election telephone poll.” That is, the NEP’s apparently “final” statement explicitly states that they only did 12,219 interviews.
As we know, by 1pm the following day, the NEP was reporting that they had actually conducted 13,660 interviews – 12% more than they had officially stated – which is starting to test the limits of credulity, given that these people are supposed to count things for a living.
One hypothesis is that there was much interview-stuffing on Nov3 to produce the requisite swing to Mr Bush. To the extent that the NEP’s ‘final’ statement is valid, then it would seem to render irrelevant all of the other discussions about chatty democrats and gender bias and the rest.
Separately, the consensus seems to be that the leaked numbers on Scoop are the legitimate NEP numbers – Mitofsky didn’t claim otherwise in that delightful exchange with Mickey Kaus. However, some of the numbers seem curious – more here if you are interested http://wotisitgood4.blogspot.com/2005/01/exit-stage-left.html
Luke, if I’m not mistaken, that NEP methods statement was circulated either with the first round of data, or prior to the election day. These were their targets based on historical turnout at sampled precincts; election turnout was higher than expected and therefore they had more samples than the statement said.
Also, they actually interviewed way more than 13,000 in that national poll. They probably interviewed about 2X that, but what is reported is only a sample of the actual polls. MP has dealt with these issues in previous posts, if I am not mistaken.
Rick,
The lower numbers were targets? Really,
thats a little hard to believe because it doesnt make sense to report numbers that are targets to the nearest person..one would expect target numbers to be rounded..but who knows..NEP seems willing to do anything..I dont trust them one bit.
thanks rick and brian – as i mentioned, i havent been following it very closely – so i’m interested in what others have to say. but i do agree with brian – the statement is very specific – to the final person – and their language is in the past tense (and hasnt been updated)
i dont know anything about interviewing or exitpolls – but i did note that they had 1500 interviewers – which comes out to about 8 interviews per interviewer which seems like a low number – perhaps rick is correct in stating that there was 2X, altho that seems kinda odd.
thanks rick and brian – as i mentioned, i havent been following it very closely – so i’m interested in what others have to say. but i do agree with brian – the statement is very specific – to the final person – and their language is in the past tense (and hasnt been updated)
i dont know anything about interviewing or exitpolls – but i did note that they had 1500 interviewers – which comes out to about 8 interviews per interviewer which seems like a low number – perhaps rick is correct in stating that there was 2X, altho that seems kinda odd.
i also posted at ruy’s to see if anyone there had any further insight http://www.emergingdemocraticmajorityweblog.com/cgi/dr/mt-comments.cgi?entry_id=1011
grrrr – i dont even know how to post properly. neither a pollster nor poster be.
Luke and Brian, first what I know…
Read Mitofsky and Edelman’s chapter in Presidential Polls and the News Media (1995) about the 1992 exits.
“Including the state polls, interviews were conducted with 177,000 voters on election day in 1,310 precincts….There was usually only one interviewer working at a polling place. Interviewers arrived before voting began and worked 50 minutes of each hour. During their 10-minute break, they did a hand tally of the number of respondents who said they voted for each candidate. Interviewers telephoned results of the hand tally and the individual responses to each question to a central site three times during the day. Before reading the individual responses, questionnaires were subsampled. The subsampling was controlled by the operator receiving the call at the central site. The subsample produced 62,024 questionnaires, of which 15,256 were from national questionnaires. The rest were from state exit polls” (pgs. 82-83).
I’ve communicated with Dan Merkle of ABC News who participated in the exit poll design. He confirmed to me that the subsample of the precinct interviews was followed again this year.
Now, what I don’t “know” is whether the NEP methods statement was written before, during, or after the election. It was my assumption based on my understanding of how previous exit poll samples were drawn (based on historical turnout estimates) and the fact that anyone who goes to CNN’s web-site knows that there were more interviews than were reported in the methods statement.
Isn’t the subsampling a way to save time on the phone on election day by not collecting the full voter profile, but still collecting voting data from the FULL sample?
In other words the subsampling only affects the voter profile data, but the vote tallies are from the full set of interviews conducted. Just want to make sure we all understand that the subsampling does not affect the predictions of Kerry/Bush %’s. Unless of course I’ve misunderstood Mark’s post on this matter.
thanks rick – although im less concerned about their methodolgy per se, and more interested in the fact that the NEP said that they conducted 12,219 interviews (however that is defined) and then the following day they apparently reported that they had conducted 12% more than that.
i dont know when the method statement was written either – but as Brian said, the level of specificity – to the final person – strongly suggests that it was written after the fact. Given that it was these last interviews which swung the result so dramaticallly, then there appears to be some serious unanswered questions.
If NEP’s statement that it only conducted 12,212 interviews is true, then they apparently/presumably lied and simply created the final 1448 interviews in the office on Nov3 in order to get the number they wanted. In my book, that is called fraud.
Alex,
If you look at the following .jpg of one of those “leaked” exit poll data (http://photos1.blogger.com/img/150/1911/640/11-2_7-30pm.jpg), you’ll notice that the crosstab for the whole proportion is based on the subsample.
I do not know if the NEP members saw different data than the subscribers (the graphic is reportedly leaked from a subscriber), or if the projected calls were made on the full tallies that were phoned in. My guess is that the projections were made based on the full tally (Why collect the data).
From MP’s early post on projections, I understand that projections of winners was hardly a factor here (i.e., battleground states weren’t “called” for a long long time).
Alex, you wrote:
“In other words the subsampling only affects the voter profile data,…”
The data that everyone (Freeman, Simon/Baiman) has been “analyzing” (I say that loosely) is based on the subsample. The data Edison/Mitofsky are analyzing include all the interviews per precinct.
This is one more reason why I’ve been saying (on my blog) that the data are too “fuzzy” for rigorous statistical analysis where probability calcs like 662,000:1 are dubious. These probability calcs are highly sensitive to slight adjustments to the data.
Luke, something else to consider. I’m pretty sure that ALL the state sample sizes were off as well (although not all were under-counted). E.g, in CA the statement said they interviewed 2,105, but the CNN web-site shows 2,390 interviews. For the Alaska poll, they showed 1,194 interviews, but the CNN website shows 1,177.
These discrepancies are simply too obvious for the NEP to be trying to pull a fast one on the public. That is why I continue to maintain that these were circulated with the election data. However, I’ll ask around…
Cheers Rick – i’ve only been looking at the ‘leaked’ Scoop data, which is by region. CBS is using the same 13,660 number, and they seem to have pulled their state data http://election.cbsnews.com/election2004/state/state_us.shtml currently returns a ‘Not Found’
I agree with you that it seems unlikely that they have pulled a fast one – yet there are obviously some outstanding questions with the data.
Interestingly, the % increases from the 7.33pm numbers to the Nov3 numbers are weighted toward the East and MidWest
East – 39%
MW – 31%
South – 13%
West – 20%
Again, I havent been following this issue closely – so i’m playing catchup and apologise if i ask questions that have been asked/resolved a thousand times. I dont know the slightest thing about polling – my experience is in industries where 12% mistakes are significant…
On a separate note – Farhad in Salon refers to midday polls – was that simply a factual error?
the nashua advocate points out other discrepancies in the size of the pool (disclaimer: i sent them an email)
http://nashuaadvocate.blogspot.com/2005/01/news-election-2004-nep-adds-hundreds.html
They talk about ‘hundreds’ of additions, which is fine – but the reality is that its the percentages which are significant here – we are talking about a 12% discrepancy in the number of interviews, in an environment where 3% is apparently considered a mandate… and if the apparent swing to mr bush was already near impossible in the final sample of (apparent) interviews – it is definitely impossible if there was actually only 12,212 interviews.
theres also something odd in the fact that if you look at the difference between the 7.33pm numbers and the numbers from Nov3, most of the % gain is from the East – how can it be that the East gained 31%, and the West gained only 17% from that point in time – given the time difference? bad cell phone coverage?
Cell phone coverage? How would that play a role in exit polls?
Rick:
Thank you for explaining the subsampling found in Freeman’s data. I think that is actually a revelation, at least to me.
So, MOE’s are based on the subsample number of interviews? Would working with the full sample improve the MOE substantially?
This subsampling issue seems to add another significant layer to the matter of figuring out what went wrong with the exit polls. Or am I not understanding the implications correctly?
Thanks for any further clarification you or Mark can provide.
First things first. I got word from Jennifer Agiesta of Edison Media Research. She has been VERY kind to me over the past month in providing me with information. Re the discrepancies:
Rick,
The CNN web site is displaying the number of unweighted cases that are being used in the crosstab for the presidential vote.
The methodology statement includes the actual number of respondents who filled out the questionnaire.
These two numbers can differ for two technical reasons:
The first reason is that some respondents fill out the questionnaire but skip the presidential vote question. For example in Alabama the CNN site shows 736 respondents. The methodology statement show 740 respondents. This is because 4 respondents chose not to answer the question on how they voted for president and are not included in those crosstabs but they are still included in the data file because they may have filled out how they voted in other races that day such as the Senate race.
The second reason is that respondents from the national absentee/early voter telephone survey received all four versions of the national questionnaire while election day respondents only received one version of the national questionnaire. Thus, these respondents are included 4 times in the unweighted data (once for each version of the questionnaire) but their survey weights are adjusted down so that in the weighted data each national absentee/early voter telephone survey respondent only represents one person.
Again the methodology statements state the correct number of total respondents interviewed.
I hope that this explains the differences in how these numbers were reported.
Jennifer
——
There you have it. The telephone surveys were quadruple counted in the sample sizes reported on the CNN website. Now, the question to me seems to be: Why not weight these surveys, before sending them around to their clients? Doesn’t make sense to me, but I don’t think this is some conspiracy here.
Luke, as far as the regional questions go, I think MP might be the best one to answer this (and he may be trying to do so now). I have ordered the 2000 exit poll data from the Roper Center for some analysis and hopefully the data sets will include some explanation of how the geos were defined. Although, I’m not too hopeful there will be much of use here.
Alex, we don’t know much about cell phone coverage error. I’ve read articles that suggest its a big issue and other articles that suggest its not an issue. Also, we don’t know what rate of “misses” or “refusals” occurred with the phone calls, let alone cell phone calls. Perhaps MP has already, or would be willing to comment on the cell phone question. I think this is an open debate in the polling world right now.
Also Alex, re the subsampling procedure. I see three sampling stages here: 1) selection of a sample of precincts (cluster); 2) sample of voters; and 3) sample of the sample of the voters.
The way I see it, the exit poll data circulated on election day and made available to subscriber clients (the leaked Scoop PDFs) were based on a sample of a sample of a cluster sample.
What does this do to the MoE? Well, nothing to the MoE if you consider the table that WAS circulated with the data on election day to subscribers (I’ve confirmed this much). If we look at those tables, there is a range of sample sizes and the clients were supposed to look at the sample size and determine the corresponding MoE. This a VERY rough guide, but according to Mitofsky (e-mail to me), it was developed based on a calculation of the design effect according to the poll design (Merkle and Edelman participated in this calculation).
Therefore, the data Edison/Mitofsky are analyzing as we speak is the full set of data. In fact, the data that Edison/Mitofsky may have used to “project” winners on election night, may have been the full tabs “corrected” for non-responses . (They “corrected” for non-responses in the 2004 primaries anyhow – the question is – what if what you “know” about the non-responses because of age, sex, and race is not “true”?).
——
Bottom line the way I see it. The data and methods are too fuzzy to make any hard statistical analysis. That means, when folks like Freeman and Simon/Baiman throw around VERY precise calculations of odds and then base conclusions on those odds, they either: 1) are trying to BS everyone; or 2) they really don’t know what they are doing. Look; anyone who glances at the data knows it is skewed. It doesn’t take a PhD…I mean…an excel spreadsheet and a 2000 analysis of the 1996 exit polls to figure that out.
The NEP holds the cards here. They have the data in the form that they need to analyze it correctly. These data will eventually be made public (via Roper Center).
Sure, some will say that these data are not trustworthy and may have been “cleansed” by the NEP (why? I don’t know), but I won’t subscribe to that theory. Edison/Mitofsky owes it to their clients to analyze the data before the likes of Freeman and Simon/Baiman write severely flawed reports.
I think their distinguished careers have earned them at least this courtesy.
One point that should be made for the benefit of foreign readers and those who were asleep during their Civics class is that the U.S. Electoral College system can forgive more sins than the Pope.
There is nothing about “democracy” in it, and so there is no possibility of tainted results propagating past the electoral college. All the math required to tabulate it can be doubled checked by anyone with a second grade education.
In the last analysis, democracy in America depends not on exit polls or even the popular vote, but on the willingness of alleged democratic states to certify their electoral votes, and the willingness of Congress to seat them. Disputes are not resolved by exit polls or re-votes or recalls or even courts, but in the last analysis by Congress.
In order to have a “failed election” similar to the one alleged in the Ukraine, under the design of the U.S. Constitution, one would have to subvert 26 out of 50 democracies–and if successful, need I add the country would already be in far greater trouble than
Perhaps sites like this would do a public service if, among the endless talk about exit polls and voting, they pointed out that the political design of the U.S. is precisely to allow inherently uncertain processes to gain net certainty, in defiance of statistics, information theory, and electoral democracy–by incorporating just a few people and some things called “Constitutions” and “Laws” in the aggregation stage. This takes out the mathematics–even to the extent of legitimating fraud–on *purpose*.
In summary: the outcome of U.S. national elections is certain and legitimate after the legal and congressional process has run its course, almost no matter *what* happens.
The legitimacy has *nothing* to do with democratic voting or elections, by design, and by excellent design–and for precisely forseeing the reasons that we cannot agree on here, namely that if we take people out of the process, and put in mathematics or machines, we no longer have a political process but a noise generator.
That means: if you feel the democratic process has been subverted, your task is to reform all 50 *states* or whichever ones you don’t like. If Ohio is not a democracy, that is a problem for Ohioans. We, as a nation, have a political system that can survive carpetbagging in all the former states of the Confederacy.
Remember this the next time someone wants you to overthow the U.S. electoral system. We are not a democracy, at the national level, so we can be ruled democratically.
Hi Mark
I still haven’t gotten any responce to the question I posted a couple of days ago. But it goes a long with this topic to so I’ll point it out again. Can you explain this
http://www.cnn.com/ELECTION/2004/pages/results/states/US/P/00/epolls.0.html
The smoking gun is the final exit polls that CNN and others have up at this time.
These are the numbers (13,660) put up to dispel the “problems” with the early numbers and the ones they have used to tell us what we think. They are OBVIOUSLY misleading/wrong (see web page above and below). The PROOF is about 2/3’s maybe 3/4’s of the way down, the question is who did you vote for in 2000. The problem is obvious 37% Gore 43% Bush in 2000 that is WRONG. Again the people that did the polling know that’s not right, the people that posted it know it’s not right. They didn’t do any fancy weighting to adjust for that, as I’ll show below. It’s that simple. It’s more obvious than the type setting on the font of an old document. Anyone with a statistical background would know it is not correct. The only thing I wonder about is how that data got in the stats at all. Maybe someone on the inside is trying to say something?? I don’t know?
Do you understand how weighted averages work?? The sum of the fraction times the amount
Numbers from the data (see again see web poll data below go to the question who did you vote for in 2000). The 51/49% Bush win is what we are being told. If you plug in the straight numbers the data comes out as they what you to believe. This works with other number too so that shows no other factor are included:
.17*45+.37*10+.43*91+.03*21=51.11% Bush
.17*54+.37*90+.43*9+.03*71=48.48% Kerry
With equal amounts of Gore and Bush AS WE KNOW?? IT WAS.
.17*54+.40*90+.40*9+.03*71=50.91% Kerry
.17*45+.40*10+.40*91+.03*21=48.68% Bush
Again, any one who puts out a poll that doesn’t have equal amounts of Gore to Bush has to know the data collection is faulty, and that any information gleamed from it is also wrong. Also if 17% of the people that didn’t vote last time went 54% Kerry and 45% Bush and that is in a Bush favoring sample. The switch Bush to Kerry and Gore to Bush are about equal, looking at that there is really no way to say that Bush won.
But that is not my point the point is that CNN and others are putting up data that is obviously wrong! Why is not really important to the point. How is not really either. The point is the data is wrong and it is still up. Any explaination?
Rick,
Hi! This is the News Editor of the Nashua Advocate. I’ve been following the discussion above with great interest, and the main problem I have with what I’ve heard — despite not being one, generally speaking, for conspiracy theories — is that we already *know* Mitofsky/Edison Media will artificially add and/or subtract “phantom voters” to their published sample size.
How do we know this?
Because they did it with the Ohio state-level exit poll, and have *admitted* to doing it.
Recall: the third of the three Election Day exit polls from Ohio — valid until the early morning of November 3rd, 2004 — showed 1,963 respondents. Mitofsky admits that the data was then “adjusted” to reflect raw vote totals. How was that done? By *subtracting* 31 Kerry “voters” and *adding* 88 Bush “voters” (the new poll had 2,020 voters as its sample size, but the new percentages for each candidate showed that no “new” voters had been queried — the existing tallies had simply been manipulated or, as Mitofsky says in his infuriating Orwell-speak, “adjusted”).
Now, what does Mitofsky’s willingness to add “phantom voters” to his sample size do to public confidence in his exit-polling?
Right now, The Nashua Advocate is reporting *four separate* allegedly “final” sample sizes for the National Exit Poll of November 2nd. How in the world is the public to distinguish between the justification Edison Media gave you for this discrepancy — which indeed sounds entirely innocent — and the entirely non-innocent fact that, in the Ohio poll, the sample size was changed using “phantom voters” who didn’t really exist and had never been queried as to their preferences?
So, when the N.E.P. releases its data in the coming weeks, how do we know if it was “adjusted” or not? And given that Mitofsky — in an e-mail to MSNBC reported on by Keith Olbermann — has expressed quite clearly that he thinks he can claim his polls are “accurate” based on *adjusted totals*, why should I or anyone else feel confidence that Mitofsky understands the difference to the American people between releasing adjusted and non-adjusted poll data?
What’s to stop him from releasing adjusted data and saying this is the “right” data, and the public has no right to see any other data because it isn’t the “right” (read: tautologically correct, by filtering in “real-world” vote tallies) data?
Thanks for any insights you can provide.
— The News Editor
Rolf,
It is a well known fact that when polled after an election people lie about who they voted for. (Dont ask me why…its absolutely bizarre and a cowardly thing to do…but apparantly they do) A few people who voted for the loser will say they voted for the winner…people just like winners. Or maybe people who didnt vote…lie a say they voted for the winner.
More here… real quick… then back to feeding the kids…
More on the subject of exit poll methodology from Merkle and Edelman (2002), “Nonresponse in Exit Polls: A Comprehensive Analysis” from Survey Nonresponse, Eds Groves, et. al. Pp 243-257.
“The VNS exit polls are conducted using a two-stage sampling design. In the first stage, a stratified, systematic sample of precincts is selected in each state, proportionate to the number of votes cast in a previous election. In the second stage, interviewers systematically select voters exiting polling places on Election Day, using a sampling interval based on the expected turnout at that precinct. The interval is computed so that approcimately 100 interviews are completed in each precinct…
“Interviewers are instructed to work with a polling place official to deermine the best place to stand. Ideally, the interviewers are located inside the polling place where all voters must pass them. Unfortunately, sometimes interviewers must stand outside, some distance from the door…
“Interviewers take a 10-minute break from interviewing each hour to tally the responses to the vote questions and their observations of the nonrespondents. Interviewers call VNS three times during the day to report their data. In local time, the first call is around 9:00AM, the second around 3:00 PM, and the last call shortly before poll closing. During each call, interviewers report their vote and nonresponse tallies and read in the question-by-question responses from a subsample of the questionnaires…”
John – others think you are an enigma (and sometimes I do as well), but welcome back! It’s been a while.
Nashua Advocate News Editor, if you could provide the sources and links to the quotes from Mitofsky, I’d like to check it out.
BTW – I HIGHLY doubt that Edison/Mitofsky will try to rely on sampling error to explain the discrepancy. They will partially fall on the sword (training, etc) and partially carve it up to differential non-response.
Remember, there are about 77,000 additional surveys out there that they have access to and we do not. Who knows how this will affect the comparative analysis?
“Adjusting” the data to ensure a valid sample is standard stats and is REQUIRED. The subject of weighting is often where pollsters and the public often go awry. Some weighting is absolutely necessary and routine, while other types of weighting involves “educated guesses” (but is also routine).
For example, if they weighted the non-responses according to what they “know” about the demographics (they can guess the age, sex, and race of the nonrespondent), then this is an example of an “educated guess.” They take what they “know” about these demos and they adjust their sample accordingly. If for some reason what they “know” is uniformly different from what “is true”, then the poll will have bias.
The Nashua Advocate News Editor and Rolf, I have a strange feeling that MP will have more on these subjects… I’ve shared about all I know.
Brian Dudley, when I hear things like “It is a well known fact” without some type of source, I start to twitch. Especially, if I don’t know that fact. 🙂
Nashua editor, Rick:
Do I understand things correctly, Nashua seems to describe Mitofsky adjusting the SUBSAMPLE from 1963 to 2020?
If it is the subsample that is being adjusted, that brings up two points:
1. So, is the adjustment Nashua points out what we call weighting to match the election results? So the reweighting involves adjustments to the subsample? That doesn’t sound like reweighting as I understand it. Or is this an additional step separate from the reweighting to match the election results? Boy this is started to get conceptually complicated on top of the statistics. Help.
2. Why wouldn’t Mitofsky incorporate his full data sample and disgard the subsample as soon as possible? It looks like his subscribers are still unnecessarily limited to his subsample based data.
Please note that if my questions indicate my misunderstanding of the issues, I apologize.
Alex, that appears to be the claim. The full sample would be more like 4,000.
re: #1. I don’t know anything more than what was described to me by the NEP. Without more information (like sources of the claims) from Nashua, who knows?
#2. I suspect that by the time the NEP had the full dataset (whenever the interviewers faxed or mailed them in), the election was over and they automatically switch into the “understanding the outcome” mode (i.e., predictions are over, now let’s weight to election result and see what it tells us).
Who knows…Maybe historically the differences between the full sample and the subsample aren’t statistically significant? Therefore, why bother?
Alex – my question re the cell phone coverage was supposed to be part tongue-in-cheek – i thought that the excuse may have have been that ‘we couldn’t call the results through’
here are the interview numbers (the times are Eastern, apologies for the formatting):
4pm 733pm 3-Nov
East 1746 2077 2888
West 1747 2203 2640
MW 2069 2804 3676
South 2787 3943 4456
Somehow, the East gained 39% more interviews *after* 7.30pm, and the West only gained another 20%, despite the time difference.
Luke, hang in there… I think I hear MP typing…
Rick,
You can find some of the comments here:
http://64.233.161.104/search?q=cache:IvRJMQdJxYwJ:www.slate.com/id/2111831/+Mitofsky+accurate+%22exit+poll%22&hl=en
and some here:
http://www.msnbc.msn.com/id/6533008/#041124a
There are others and I’m still looking for them. One is definitely to be found in a Democratic Underground thread, which I’m trying to track down.
As I understand it, what was done in Ohio was *not* “weighting” — because the sub-group percentages (e.g. men/women) did not change between the 1,963-voter sample and the 2,020-voter sample: just the number of “voters” casting a preference for each candidate. As I understand it, this is not weighting. There’s another word for it — filtering in raw-vote information — in polling terminology, and perhaps you know it; I did once but I’ve forgotten it.
More as I find it.
— The News Editor
Rick,
Back with more.
The following is from an interview Mitofsky did with Mayflower Hill (this is the article author speaking):
“I’ll add, though its somewhat public knowledge at this point, that Warren agrees with the conventional wisdom explaining how certain bloggers reached the wrong conclusions. The data that was reported on election day had not been ‘weighted’ for turnout yet. Once an accurate projection of overall voter turnout is made, the raw data that the exit pollsters collect is plugged into a complicated methodological system that I won’t begin to pretend to understand. The point is, though, that a sort of ‘correction’ is made to the raw numbers that everyone saw on Wonkette and other sites. The bloggers who ran those numbers either didn’t know about the system of ‘weighting’ the exit polling data, or didn’t bother to point it out.”
Find the whole interview here:
http://mayflowerhill.blogspot.com/2004/11/mayflower-hill-exclusive-warren.html
The info the “bloggers” had on Ohio on Election Day was the 1,963-voter sample. That was removed from CNN’s website sometime between 12:30 A.M. and 2:15 A.M. on 11/3/04. It was replaced with a 2,020-voter sample which is now the “final” Ohio exit poll. The first sample shows Kerry winning 51/48, the second shows Bush winning 51/48. Indeed, the second sample matches the final Ohio vote total to within 0.1%, yet is not (using pure statistics) statistically possible — i.e., its result cannot be reached, or even approximated, by adding 57 voters (even were they all Bush voters, a statistical impossibility) to the 1,963-voter sample. In fact, the number is reached by removing 31 Kerry voters and adding 88 Bush voters. However, the screenshots of the two samples do not seem to indicate any substantial alteration in any sub-group breakdowns. Coupling this with Mitofsky’s acknowledged use of actual raw-vote data on Election Day (which I recognize is itself standard practice), the conclusion is clear: if the “raw-vote” data being used/filtered in was not related to the sub-group breakdowns, it could only have been — then — related to the voters’ preference for President. Meaning, voters were added and subtracted from the 1,963-voter sample purely with a design to make the presidential-vote query match (in the 2,020-voter sample) the “raw-vote” totals then seen in Ohio.
Which they did.
Accurate to 0.1%.
Without any evidence that “turnout demographics” changed substantially between the two samples (indeed, my research suggests that if there was any shift, it was in Kerry’s favor — as the final sample had more [I believe it was] non-white women than the 1,963-voter sample).
— The News Editor
1) My new post on the main page takes up the dicussion on Luke’s question and Rick’s email
2) I’ll try to get to the issue of the regional interview counts tomorrow
3) Nashua Advocate: What you are describing is the weighting procedure explained here many times that “corrects” the exit poll to match the count. It does not add or subtract respondents, although it may appear that way. You may want to review this post, which describes theprocedure and the timing:
http://www.mysterypollster.com/main/2004/11/the_difference_.html
You may also want to check my FAQ on exit polls:
http://www.mysterypollster.com/main/2004/11/faq_questions_a.html
If that doesn’t help, please email me..
News Editor:
“and the entirely non-innocent fact that, in the Ohio poll, the sample size was changed using “phantom voters” who didn’t really exist and had never been queried as to their preferences?”
I don’t follow why you think the increase in sample size implies ‘phantom voters.’ Why couldn’t they just create a larger sample size from their full set of data, which is not to say that would be a legitimate procedure, who knows.
But this brings me back to my other point. Mitofsky has the time to adjust sample sizes, appears willing to so in the case of Ohio (for whatever reason and assuming News Editor is accurate), then why not adjust the sample size to the FULL sample size?
Note to Rick, it seems to me the full sample would be most helpful during the “understanding the outcome” mode, considering all the small demogroups they have to worry about.
So one more academic question to throw on the pile. Actually now that I think about, my questions only stem from the fact that maybe Mark sort of short changed us in explaining subsample adjustments in his post detailing the various types of exit poll data. Apologies to Mark if that’s not the case at all. Once he explains it (again?) in his authoritative style it will probably make sense.
Luke:
“Given that it was these last interviews which swung the result so dramaticallly, then there appears to be some serious unanswered questions.”
Are you saying that the last interviews where what adjusted the exit polls to show to match the election results? Thanks.
To clarify my question. Exit poll data goes through a variety of stages including explicit weighting to match election results. So, that the exit polls matched the election results after awhile doesn’t directly imply that the increase in sample size is to blame, since the standard weighting procedure could also be to “blame.”
News editor:
I just reread this thread, and withdraw my comment about your phantom voters. I now understand what your trying to say.
Besides my “response” was just plain wrong.
Sorry for any confusion. I won’t post again until I ‘understand’ this thread.
alex – im not an expert in any of these matters – but the presumption is that if kerry *appeared* to be way ahead at 7.30pm, and then the final results showed that not to be true, then at least simplistically, the presumption is that the last bunch of interviews were ‘bush-heavy’.
i do have a lot of edu and professional experience with numbers tho – and i agree that most people dont have a clue how to deal with numbers – which is why i havent tried to delve into the detail of other peoples’ numbers – other than the highest level where it appears that someone says ‘we did x interviews’ which is apparently contradicted by their own statements. Mark may have solved some of these curiosities for us in his latest post. im also well aware that many who deal with numbers ‘professionally’ really dont really have a clue about how to use numbers either.
there are others who have suggested that the exitpolls have been massaged to converge with the ‘actual’ result – but i cant comment on that – i havent been following it closely enough.
it is curious that the NEP hasnt released their data already – ive seen quicker responses from the public service.
Nashua,
One thing I didn’t notice on my first read of this last night…
“However, the screenshots of the two samples do not seem to indicate any substantial alteration in any sub-group breakdowns.”
Do you have pre 12:30 screenshots for Ohio that show subgroup break downs? If so, I would love to see them.
MB
oops – i just reread my post from late last night “Mark may have solved some of these curiosities for us in his latest post. im also well aware that many who deal with numbers ‘professionally’ really dont really have a clue about how to use numbers either.” These two sentences were supposed to be independent and unrelated, but they quite dont read like that. The 2nd sentence was supposed to be a general comment, and was not intended to impugn MP or anyone else here. Full and unreserved apologies if this was misconstrued.
“Do you have pre 12:30 screenshots for Ohio that show subgroup break downs?”
I’ve asked Freeman to make these available (assuming he has them from Simon), but maybe he didn’t read my e-mail. Perhaps we should hound Simon?
Luke,
Do you now understand the reweghting procedure that Mark explains here:
http://www.mysterypollster.com/main/2004/11/the_difference_.html
If I understand your comments, you believe that changes to the exit poll results available at different times would be solely due to additional surveys. The above post explains that reweighting, of different types but most importantly to match the election results once they start coming in, is another cause of changes to exit polls over time.
Before this raises any redflags, please read the post to know this is standard and that “pure” exit polls are also available, sort of.
Hope that helps.
cheers alex – yeah, i appreciate that there is some re-weighting that occurs, altho i havent tried to understand the mechanics – so i shouldnt really speculate. a little bit of knowledge is a dangerous thing…
It appears to be true that the number of interviews is an absolute though, right, and the weightings are internal to this?
Mark says “Presumably, NEP started to deliver weighted data for eastern states at about 4:00 EST, but weighted data for states in western states may not have been available until 6 or 7 pm EST.” and “NEP uses these final reports to prepare exit poll tabulations for each state a few minutes before the polls close. The data are weighted by actual turnout.” So it seems that we can begin to assume that the 7.33pm reports are largely already weighted – definitely in the EST states, and at least partially weighted in the other times zones?
And I’m still trying to understand how (what appears to be) the absolute size of the East poll grew by 39% *after* the 7.33pm report – given that the final numbers were supposed to be called in ‘an hour before the polls closed’ – or have i misinterpreted this somehow?
cheers
Luke wrote:
“One hypothesis is that there was much interview-stuffing on Nov3 to produce the requisite swing to Mr Bush. To the extent that the NEP’s ‘final’ statement is valid, then it would seem to render irrelevant all of the other discussions about chatty democrats and gender bias and the rest.”
AND
“alex – im not an expert in any of these matters – but the presumption is that if kerry *appeared* to be way ahead at 7.30pm, and then the final results showed that not to be true, then at least simplistically, the presumption is that the last bunch of interviews were ‘bush-heavy’.”
The reweighting speaks to your comments above. I haven’t followed, ney understood, this thread sufficiently to know if you still hold this position, but my point is your misunderstanding of the difference between “final” results and the 7:30 PM results.
What you consider the “final” results are exit polls that have been reweighted to match the election results. The 7:30 PM results are not already weighted to match the election results. The have experienced a second type of weighting to account for turnout and non-response profile, but not election results. So, independent of sample size changes, the change from the 7:30 PM results to the “final” is mostly, if not entirely, due to reweighting to match the election results.
This is not to say the changes in sample size aren’t interesting, but I haven’t read Mark Blumenthal argue these changes could result in a significant shift of the exit poll results. It seems to me the sample size debate is practically independent of the exit poll discepancy debate. Unless I have missed something. Hard to keep up with such a technical and somewhat muddied issue.
alex – thanks for your thoughts – and you are half right!
firstly, let me clarify something – your latest post quoted me twice, and in both quotes i used the word ‘final’ – although i was referring to different ‘final’ reports.in the first quote (“To the extent that the NEP’s ‘final’ statement is valid”) i was referring to the Nov2 document:MethodsStatementNationalFinal.pdf – which is where they said that they had conducted 12,212 interviews, and then the next day their numbers appeared to say that they had conducted 13,660 interviews. We’ve since learnt that the 13,660 number was actually incorrect – but at the time my statement was (seemingly) reasonable, as it appeared as though they had ‘created’ 1500 interviews the following day.
As to my second quote, yep, it demonstrated my ignorance, and i wouldnt write it again today 🙂
two things:
a) my sense is that the 7.30 results actually had not been *as fully* reweighted for turnout and non-response as you seem to think (because we were still missing a full 23% of interviews) – although i’m just assuming.
b) itd be lovely (drool) to get the nov3 report of the full 12212 interviews before it was ‘tainted’ with the actuals
Finally, your point about the independence of the two debates is spot on – and most of the discussion now is on the sample size issue.
cheers