Just a reminder, in case you missed my holiday post, that I am taking a break this week. Blogging will resume in the New Year.
In the meantime, here’s an off-topic suggestion: Daniel Drezner has links to agencies that are directing relief toward those affected by the earthquake and tidal waves and other related information.
Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.
23 thoughts on “Happy New Year”
Mark: I’ve had the chance to review your citations and additional exit polling literature, and I come to a very different assessment. I’m going to make several posts to address your points individually.
OK, I see you’re on vacation now. I’ll send it to you so you can post it when you’re back. — Steve
“In light of the history of exit polling and the particular care that was taken to achieve an unprecedented degree of accuracz in the exit polls for Election 2004, there is little to suggest significant flaws in the design or administration of the official exit polls. Until supportive evidence can be presented for any hypothesis to the contrary, it must be concluded that exit polls, including the national mega-sample within its +-1.1 percent margin of error, presents us with an accurate measure of the intent of the voters in the presidential election of 2004.
According to this measure, an honest and fair voting process would have been more likely than not — at least 95 percent likely, in fact — to have determined John Kerry to be the national popular vote winner of Election 2004…”
from “The 2004 Presidential Election: Who Won the Popular Vote? An Examination of the Comparative Validity of Exit Poll and Vote Count Data” by J. Simon of Verified Vote 2004 and R.P. Baiman of the University of Illinois at Chicago http://freepress.org/images/departments/PopularVotePaper181_1.pdf
aaa, the Simon/Baiman paper is a great example of a strawman. I’ve quoted from the paper, then commented.
“Exit polling, since its invention several decades ago, has performed
reliably in the projection of thousands of races, both here at home and, more recently,
abroad.1”
Funny, but if you look up the citation given for the Simon/Baiman statement quoted above, it tells you something quite different. Exit polls were “invented” in 1964 by a woman named Ruth Clark when polling the presidential primary (pp. 88). Mitofsky describes how exit polls were used extensively from 1970 to 1980 by CBS for background, but not for projections at all until 1982. “It was not until 1982 that CBS News used exit polls for projections” (pp. 88) Also, NBC didn’t even do exit polls until 1980 and didn’t use them for projections until 1982. If Simon/Baiman would have read that citation, they would not have made that statement (unless they were of course intending to be misleading and/or deceptive).
From Simon/Baiman paper: “The record of exit polling from the 1970s through the 1990s was essentially free of controversy, except for the complaint that publication of exit poll results prior to poll closings dampened voter turnout by discouraging late-in-day voters from bothering to vote, the race having already been “called.”2 Voters could be so influenced because they had come, indeed, to regard exit poll projections as all but infallible.”
True, the controversy was largely about discouraging late-in-day voters, but consider also that the before poll closing exit polls had a record of being terribly wrong (1984, 1988, and 1992 elections).
This from Mitofsky and Edelman (A Review of the 1992 VRS Exit Polls, from Presidential Polls and the News Media, by Lavrakas, Traugott, and Miller, [1995]): “As the evening went on, the national exit poll gradually was completed. The results shown in Table 6.3 for midnight were the only results that included the completed exit polls. All estimates at earlier times were incomplete…The difference between that final margin and the VRS estimates was 1.6 percentage points. VRS consistently overstated Clinton’s lead all evening…Overstating the Democratic candidate was a problem that existed in the last two presidential elections. ” (pp. 91-92).
To recap. Exit polls were first used to project elections in 1982 and Mitofsky/Edelman report skewed exit polling results for the 1984, 1988, and 1992 elections.
More from the Simon/Baiman paper: “Significant exit polling problems began to appear along with the development and spread of computerized vote counting equipment, since which time exit polls have had a notably poorer track record in spite of improvements in polling methodology.”
I defer to my “recap” above. Where is the evidence of this charge? Not provided. It’s just stated. Do we just believe the authors because they say so, or do we demand evidence to support their statements?
This is almost like what Freeman pulled. Freeman used the German exit polls to show how “accurate” they are, but suppressed evidence in the process (this may have been done in ignorance, but the honest thing to do would be to revise the paper or retract it when confronted with additional evidence that contradicts the point). See MP’s post “What About Those German Exit Polls?”
Strawman and Supression of Evidence are fallacy’s of logic used when an author’s case is tenuos.
Re: their national Bush/Kerry exit poll extrapolation. I’m still amazed that the author’s were able to divine a number significant to a 10th from an extrapolation based on whole proportions. Especially when the proportions circulated to the networks on election eve were rounded to a whole proportion and nothing more precise.
Richard Morin of the Washington Post wrote in an e-mail to me that the final circulated national result was 51 Kerry, 48 Bush. Nothing more precise. But the Simon/Baiman extrapolations are more conservative (50.8 Kerry, 48.2 Bush).
Also, why do the authors, like Freeman, insist on applying the 30% correction from the Merkle and Edelman citation? I’ve corresponded with Merkle and he wrote:
“What was in the Merkle and Edelman chapter is only a general estimate based on work at VNS in the early 1990s. The design effect will vary state by state based on the number of interviews per precinct and how clustered the variable is. More work was done on this in 2004 by Edison/Mitofsky. Edelman and I did participate in this. I would suggest using the design effects calculated by Edison/Mitofsky for their 2004 polls.”
So – you have the author of the study cited by both Simon/Baiman and Freeman saying that this is not appropriate for use with the 2004 exits.
So what is appropriate? Mitofsky wrote that the design effects varied from 1.5 to 1.8, depending on the average number of surveys per precinct.
Mitofsky wrote to me saying: “Both Merkle and Edelman participated in this latest calculation. Freeman would be making (another) mistake using 1.3.”
Dan Merkle told me that the average interviews per precinct is calculated considering only the intercept interviews and that the telephone interviews of early and absentee voters have a different, but smaller design effect.
This makes the standard error calculation a bit more complicated. However, the design effect square roots of 11,719 interviews in 250 polling stations yields 47 interviews per station, giving a design effect square root of 1.6 (or 60%). This is quite a bit higher than the 30% factor used in the Freeman and Simon/Baiman papers.
From the Simon/Baiman paper: “To carry our analysis further, we can employ a normal distribution curve (see Figure 1) to determine—again assuming proper poll methodology, no discriminatory voter suppression 24, and an accurate and honest popular vote count—“
What about non-sampling error? Why suppress the fact that there has been consistent democratic bias in previous exit polls (All previous presidential exit polls that I’ve read about)? Sure this year the bias was worse than other years, but the quote above excludes the possibility of differential non-response which evidence from previous exit polls have shown to be able to explain some portion of the observed Democratic bias.
Continued from the Simon/Baiman paper: “…that the probability that Kerry would have received his reported popular vote total of 48.1%, or less is one in 959,000—a virtual statistical impossibility.”
I don’t know about the 1:959,000 figure (no time to crunch numbers), but anyone who reads the NEP methods statement knows that the predicted MoE was +/-1%, so this isn’t news. The national exits were outside the margin and were significant. In fact they were well outside. This doesn’t take rigorous analysis.
Finally – from the Simon/Baiman paper: “The clear implication of our analysis is that neither chance nor random error is responsible for the significant incongruence of exit poll and tabulated vote results, and that we must look either to significant failings in the exit poll design and/or administration or to equally significant failings in the accuracy and/or fairness of the voting process itself to explain the results.”
I suspect that Even Mitofsky and Lenski wouldn’t argue that the nation exits can be explained by random chance or random error. However, they are looking at differential non-response as the leading theory for much of the variance.
Dinner time, but Simon/Baiman do a poor job of pre-emptively “discreting” the differential non-response hypothesis. This is my opinion anyway…
Rick, you certainly do seem to be working hard at being dismissive. Why is that?
Me, I’d like to see what the covariance is between percentage of GOP/DLC/DINO election officials from, say, 1980, and the loss in exit-poll accuracy.
mairead – data are data. If the data suggests that Bush stole the election or that someone in the NEP purposefully leaked faulty data in an attempt to throw the election to Kerry , then I want to know that as an American. However, the problems with the data in the public realm and the analysis of these data are numerous. It’s more of an academic contention than anything.
For example, I agree with the simon/baiman paper that the national exits were way outside the margin of error (haven’t done my own analysis to determine just how far outside, but given the spread was 3 pts and the MoE given by the NEP was =/- 1%, they were certainly WELL outside the MoE.)
What can explain this? Personally, from what I’ve read, differential non-response is the most likely culprit, but it could be a combination of sampling error, non-sampling error (e.g., differential non-response, Bush fraud, and/or NEP fraud). Those who keep abusing/misusing the literature and/or data only weaken their case.
If their point is that something went wrong with the exit polls, well, I don’t think Edison/Mitofsky will disagree with that. But using terrible papers and/or analysis is simply not what we expect of high caliber academics. That’s my beef with it all.
Freeman still stands by the use of his 30% correction. Unbelievable… Has he bothered to contact Merkle or Edelman? Obviously not.
Here is what Mitofsky wrote about this:
“Both Merkle and Edelman participated in this latest calculation. Freeman would be making (another) mistake using 1.3.”
Here is what Merkle wrote about this:
“What was in the Merkle and Edelman chapter is only a general estimate based on work at VNS in the early 1990s. The design effect will vary state by state based on the number of interviews per precinct and how clustered the variable is. More work was done on this in 2004 by Edison/Mitofsky. Edelman and I did participate in this. I would suggest using the design effects calculated by Edison/Mitofsky for their 2004 polls.”
I gave Freeman this information weeks ago. I know he read it because he responded that he wasn’t ignoring my e-mails. I urged him to contact Mitofksy and Merkle about the use of the 30% factor in his paper. Obviously he either did not, or excluded their statements.
This is supression of evidence, simple. It makes his case weaker and therefore he ignored it.
Failing to disclose these facts in this paper is to me a bit unethical. I’ll let readers decide. I’ll also be filing a formal complaint with his university.
Thanks for responding, Rick!
As I’ve mentioned elsewhere here, my training is in social psych. And that naturally makes psychosocial factors figural for me, as it does also for Steve Freeman.
When I look at Steve’s evolving paper, I’m unable to see in it any basis for your criticism (“But using terrible papers and/or analysis is simply not what we expect of high caliber academics. That’s my beef with it all.”).
Steve’s paper seems to me to be very conservative: he points out that it *could* be one or more of the factors you hypothesise here…but there’s no evidence.
You and Mark seem to be making a baseline assumption that the exit polls are wrong, and you’re offering a number of hypotheses about why that might be so. But you guys appear to characterise your hypotheses as though they are probative, which doesn’t seem like a good idea.
Mark, for example, mentions that the exit polls have been decreasingly accurate since 1990 or thereabouts. But in fact we don’t know they’re inaccurate–all we know is that they differ to the tallies. There’s no independent validation of the tallies; they’re simply assumed to be correct, which of course is really the question at issue!
On the other hand, we do have completely solid evidence, in part from NORC’s recount of Florida2K, that there was a lot of fraud of various kinds (I use the term in a general sense) and that the wrong person is in the WH now. We also know for sure that it was basically the same crew running things then that were running things this time, too.
Has anyone looked at the NORC tallies and compared them to the exit polls for Florida2K, with corrections for egregious ‘mistakes’ such as the thousands of Jewish votes shunted off to Buchanan? How do those exit polls pan out?
And, as I asked above, what about the covariance between the alleged drop in exit-poll accuracy and the changes in who controls the polling places?
Things like that could be good data, one way or the other, don’t you think? It seems to me that we could bear having more data.
Rick writes: “This is supression of evidence, simple. It makes his case weaker and therefore he ignored it.
Failing to disclose these facts in this paper is to me a bit unethical. I’ll let readers decide. I’ll also be filing a formal complaint with his university.”
erm, I’d gang warily with that ‘filing a formal complaint’ idea, Rick. As a grad student, it could take more time than you’d like to get the egg off your face.
I’m not clear about what you’re calling a ‘fact’, Rick. That the 30% fudge factor is wrong? That’s not a ‘fact’. That’s not even a heuristic, proven to have practical value over time. It’s a mere *opinion*. To be more than an opinion, it would need to be demonstrated to be true, not merely popular.
Merkle and Edelman worked with Mitofsky and Lenski to calculate the design effect for the 2004 exit polls. They calculated the effect to vary from 1.5 to 1.8 depending on the average number of intercept interviews per number of precincts sampled.
Freeman insists on using the 1.3 factor from the 1996 exit polls. Freeman knows that one of the author’s of the citation that calculated the 1.3 factor (national factor, not a state-by-state factor) says that it is not appropriate for use with the 2004 exit polls.
There is also the problem with the single-tail test. The null-hypothesis requires a two-tail test. Therefore his p-values are highly inflated (2X actually).
If you search the comments to other posts on this site you can find more about why this is a two-tail test, not a single-tail test.
Freeman probably also knows (I’ve given him the spreadsheet) that using the 1.5 to 1.8 factors according to the Mitofsky methods, means that the statistical significance of the OH, PA, and FL depends on the rounding prodedure (there are error bounds associated with his data due to rounding). Meaning, the data are VERY “fuzzy” and precise Z-scores and p-values, let alone compounding probability calculations also have error bounds. The problem is that the lower error bound of these data shows that neither FL, OH, or PA are singificant, whereas the upper error bound show that all three states could be significant (but in OH the significance is associated only with the Bush proportion, not the Kerry proportion).
If Freeman said “The exit polls in OH, FL, and PA, are really far off and I think something is up” then I would say to him – “join the crowd.”
But he insists on using flawed methodology and supresses evidence even though these things have been demonstrated to him. The “fuzziness” of the data should be OBVIOUS to him or anyone who has analyzed these data closely and therefore reporting findings without an error bound, or at least acknowledging that an error bound exists, is in my opinion in flagrant violation of academic standards and ethics. He cannot plead ignorance here. He didn’t even bother to try to address these issues.
Again – no one disagrees that the exit polls were way off and everyone wants to know why. The problem here is that the data we have in the public realm cannot justify the type of analysis conducted by Freeman. 1.3 is wrong. Rounding to a 10th is wrong. Use of a single-tail is wrong. And even if all of those things were right, it is still wrong to take look at three significant findings in a vacuum. There were 52 exit polls taken on election day. Therefore the probability (1:662,000) is NOT p*p*p. Even undergraduates with stats background know this. To not explain why he insists on using p^3 rather than taking the number of significant findings out of 52, is either sloppy or disengenous.
These are the reasons for the complaint. I don’t expect anything to come of this, but I just want the University to know what has been done on their letterhead. I will also urge those who have advised me in preparing my response to Freeman’s paper to contact Freeman and the University and also make their grievances known.
All Freeman had to do was contact Merkle and ask if the 1.3 factor could be used with the 2004 data. Merkle would tell him what he told me – NO! But this would seriously undermine Freeman’s paper so he supressed evidence.
Please note that I made some of the same mistakes as Freeman (and CalTech/MIT). But I’ve gone on the record and corrected my past mistakes. http://stones-cry-out.blogspot.com/2004/11/what-went-wrong-with-exit-polling.html
As far as “NORC’s recount of Florida2K” I don’t know where these data are available and I’m not sure what you would like to know. I haven’t even seen the 2000 exit poll data, although I hear it is available via the Roper Center. Perhaps you could refine your hypothesis or research question and someone with access to the Roper Center data would take on the analysis.
“Freeman insists on using the 1.3 factor from the 1996 exit polls. Freeman knows that one of the author’s of the citation that calculated the 1.3 factor (national factor, not a state-by-state factor) says that it is not appropriate for use with the 2004 exit polls.”
You’re doing it again, Rick. So what if Mitofsky et al. ‘calculated’ it? In engineering when you make up a number, it’s called a ‘WAG’–a ‘wild-ass guess’. If you use a computer to make up the number, it’s a SWAG – a *scientific* wild-ass guess. But it’s still only a guess. It has no a-priori empirical validity. Yet you talk about it as though it were a well-understood, accepted number with a lot of predictive power. I don’t understand that.
If we look at that fudge factor in a slightly different light, part of it becomes an estimate of the amount of fraud in the system. How difficult would it be to derive such a number? I’d think it would be quite difficult, particularly if nobody is thinking about it in those terms.
Mairead, I’ve posted a critique of the Simon/Baiman paper. If you read this, you will see why things like the 1.3 vs. 1.5, vs. 1.8 vs. ???? matters. Freeman makes a conclusion that is based on his probability calc (662,000:1). This probability calc is HIGHLY sensitive to a number of factors, one of which is the design effect.
Again, his broad conclusions may be correct, but the probability calculations that are the base of these conclusions are subject to very large error.
My question isn’t ‘why is the number important’, Rick, it’s *how do we know the number is correct?*
Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct! Okay, what does that mean? It means the number was not like the number you can look up to determine the sample size you need to get a confidence interval of some size n. It was not a well-defined number. It was a guess.
But these people are experienced, so why was it so hard to determine that number?
One strong possibility–we have evidence–is that it’s not really one number, it’s two–but the second one, the fraud factor, is being ignored. So Mitofsky et al. are trying to guess what this conflated number is without taking all causal factors into account.
So Freeman (I’ll ignore the Simon & Baiman paper as a red herring at this point, if that’s okay) is using some fudge factor (30%) that he believes represents the factors that contribute to legitimate uncertainty.
And he concludes that there’s also some other as-yet-unidentified factor influencing the outcome. That looks like good science to me.
I really think you have to offer more than unsupported hypotheses about the cause of the unpredictability before you can impeach Freeman’s choice of 30%.
(ignore this…i’m just trying to see whether html works)
My question isn’t ‘why is the number important’, Rick, it’s *how do we know the number is correct?*
Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct! Okay, what does that mean? It means the number was not like the number you can look up to determine the sample size you need to get a confidence interval of some size n. It was not a well-defined number. It was a guess.
But these people are experienced, so why was it so hard to determine that number?
One strong possibility–we have evidence–is that it’s not really one number, it’s two–but the second one, the fraud factor, is being ignored. So Mitofsky et al. are trying to guess what this conflated number is without taking all causal factors into account.
So Freeman (I’ll ignore the Simon & Baiman paper as a red herring at this point, if that’s okay) is using some fudge factor (30%) that he believes represents the factors that contribute to legitimate uncertainty.
And he concludes that there’s also some other as-yet-unidentified factor influencing the outcome. That looks like good science to me.
I really think you have to offer more than unsupported hypotheses about the cause of the unpredictability before you can impeach Freeman’s choice of 30%.
(ignore this…i’m just trying to see whether html works)
(I apologise for these hiccup posts…there’s some incompatibility between the blog and my usual browser (opera 6.5). I’ll try to remember to use this one (opera 7) from now on instead.)
“it is still wrong to take look at three significant findings in a vacuum. There were 52 exit polls taken on election day. Therefore the probability (1:662,000) is NOT p*p*p. Even undergraduates with stats background know this. To not explain why he insists on using p^3 rather than taking the number of significant findings out of 52, is either sloppy or disengenous.”
I meant to comment on this, but it slipped my mind.
Are you sure in your objection here you’re not confusing what Steve did with the impermissibility of cherry-picking some subset of numbers *after the fact*?
Steve motivated his choice of those numbers: they were states *identified beforehand* as significant. So it seems to me that they form a population–exactly as though they had been the only ones examined. Which, if I correctly recall my maths lessons, makes p*p*p the right way to calculate overall probablility.
No?
There are MANY problems with the Freeman analysis.
“Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct!”
The 30% fudge factor only applies to sampling error. Exit polls have been subject plagued by non-sampling error throughout their history. The non-sampling error cannot be predicted. If it could, they would design it out of the survey.
I don’t want to get into an argument with you when it is obvious that you have not read Mystery Pollster’s posts on exit polls. He answers many of your questions.
If you want a more detailed look at problems with the Simon/Freeman data and methods (the problems are virtually identical) you can read the post on my blog.
Take the 30% – who cares. The data in the public realm are too fuzzy for precise analysis. That is what my post proves. The very fact that the data are lacking a significant digit adds tremendous variability to the probabilities associated with any calculated discrepancy.
“Are you sure in your objection here you’re not confusing what Steve did with the impermissibility of cherry-picking some subset of numbers *after the fact*?”
That is ONE of the many problems with the paper. He “assumes” no bias in the survey or the election result, but still uses a single-tail test. This violates his assumption. Simon/Baiman commit this error as well. It’s in my post.
Now what if I can demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant? (I can).
I will be posting a critique of the Freeman paper similar to my critique of the Simon/Baiman paper within the week. You can read this critique and judge for yourself.
What is the point really? That there is a “significant” discrepancy? So what? Who disagrees here? Certainly not Mitofsky or Lenski. What I (and others) disagree with are the terrible methods used for “scientifically” calculated probabilities. This part is a joke. The conclusion that the discrepancy is significant is not news. If they want to write a paper putting forth their hypotheses for explaining the discrepancy, no problem. But they should clean up their terrible and unsupportable statistics first. Be honest with the limits of the data and make a case for why better data should become available. (Also be honest about the literature review!)
“Now what if I can demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant? (I can).”
I don’t really have time right at the moment to respond in detail, but I’m sure someone else will pounce on this if I don’t, so…
When you’re using numbers and assumptions that have no empirical validity then sure, you can demonstrate fairly much whatever you like, Rick–you’re creating a world of hypothesis! But that doesn’t mean that what you demonstrate within that hypothesised world has any applicability to *this* world.
I’ll continue later, unless someone else beats me to it.
“But that doesn’t mean that what you demonstrate within that hypothesised world has any applicability to *this* world.”
My point exactly! Send that to Freeman and Simon/Baiman.
When I demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant, I will be doing so not to reject (or confirm) any null hypothesis. Only to demonstrate the “fuzziness” of the data and the implications of this fuzziness on reaching a single probability. (Other probs as well, but this works for this discussion).
That is, using the fuzzy data available to calculate a precise probability without acknowledging the fact that the probability has an error bound and without estimating this error bound is absurd.
Remeber, the null they rejected is that the discrepancy occurred by chance. I’m not saying anything about the validity of their hypotheses attempting to explain what happened. This is a separate question. In fact, they could put forth their discussion of possible explanations for the discrepancies sans the statistical analysis. If they did this, or at least admitted that there are error bounds associated with their probability calcs, and they fairly represented the literature, then I wouldn’t have a problem. They are free to speculate/argue for/against any explanatory hypothesis that they wish. That’s not my beef.
Have you bothered to read my post? Why not take this conversation into the comment section on my post? If MP cared about any of this debate, he may have jumped in by now.
Mark: I’ve had the chance to review your citations and additional exit polling literature, and I come to a very different assessment. I’m going to make several posts to address your points individually.
OK, I see you’re on vacation now. I’ll send it to you so you can post it when you’re back. — Steve
“In light of the history of exit polling and the particular care that was taken to achieve an unprecedented degree of accuracz in the exit polls for Election 2004, there is little to suggest significant flaws in the design or administration of the official exit polls. Until supportive evidence can be presented for any hypothesis to the contrary, it must be concluded that exit polls, including the national mega-sample within its +-1.1 percent margin of error, presents us with an accurate measure of the intent of the voters in the presidential election of 2004.
According to this measure, an honest and fair voting process would have been more likely than not — at least 95 percent likely, in fact — to have determined John Kerry to be the national popular vote winner of Election 2004…”
from “The 2004 Presidential Election: Who Won the Popular Vote? An Examination of the Comparative Validity of Exit Poll and Vote Count Data” by J. Simon of Verified Vote 2004 and R.P. Baiman of the University of Illinois at Chicago
http://freepress.org/images/departments/PopularVotePaper181_1.pdf
aaa, the Simon/Baiman paper is a great example of a strawman. I’ve quoted from the paper, then commented.
“Exit polling, since its invention several decades ago, has performed
reliably in the projection of thousands of races, both here at home and, more recently,
abroad.1”
Funny, but if you look up the citation given for the Simon/Baiman statement quoted above, it tells you something quite different. Exit polls were “invented” in 1964 by a woman named Ruth Clark when polling the presidential primary (pp. 88). Mitofsky describes how exit polls were used extensively from 1970 to 1980 by CBS for background, but not for projections at all until 1982. “It was not until 1982 that CBS News used exit polls for projections” (pp. 88) Also, NBC didn’t even do exit polls until 1980 and didn’t use them for projections until 1982. If Simon/Baiman would have read that citation, they would not have made that statement (unless they were of course intending to be misleading and/or deceptive).
From Simon/Baiman paper: “The record of exit polling from the 1970s through the 1990s was essentially free of controversy, except for the complaint that publication of exit poll results prior to poll closings dampened voter turnout by discouraging late-in-day voters from bothering to vote, the race having already been “called.”2 Voters could be so influenced because they had come, indeed, to regard exit poll projections as all but infallible.”
True, the controversy was largely about discouraging late-in-day voters, but consider also that the before poll closing exit polls had a record of being terribly wrong (1984, 1988, and 1992 elections).
This from Mitofsky and Edelman (A Review of the 1992 VRS Exit Polls, from Presidential Polls and the News Media, by Lavrakas, Traugott, and Miller, [1995]): “As the evening went on, the national exit poll gradually was completed. The results shown in Table 6.3 for midnight were the only results that included the completed exit polls. All estimates at earlier times were incomplete…The difference between that final margin and the VRS estimates was 1.6 percentage points. VRS consistently overstated Clinton’s lead all evening…Overstating the Democratic candidate was a problem that existed in the last two presidential elections. ” (pp. 91-92).
To recap. Exit polls were first used to project elections in 1982 and Mitofsky/Edelman report skewed exit polling results for the 1984, 1988, and 1992 elections.
More from the Simon/Baiman paper: “Significant exit polling problems began to appear along with the development and spread of computerized vote counting equipment, since which time exit polls have had a notably poorer track record in spite of improvements in polling methodology.”
I defer to my “recap” above. Where is the evidence of this charge? Not provided. It’s just stated. Do we just believe the authors because they say so, or do we demand evidence to support their statements?
This is almost like what Freeman pulled. Freeman used the German exit polls to show how “accurate” they are, but suppressed evidence in the process (this may have been done in ignorance, but the honest thing to do would be to revise the paper or retract it when confronted with additional evidence that contradicts the point). See MP’s post “What About Those German Exit Polls?”
Strawman and Supression of Evidence are fallacy’s of logic used when an author’s case is tenuos.
Re: their national Bush/Kerry exit poll extrapolation. I’m still amazed that the author’s were able to divine a number significant to a 10th from an extrapolation based on whole proportions. Especially when the proportions circulated to the networks on election eve were rounded to a whole proportion and nothing more precise.
Richard Morin of the Washington Post wrote in an e-mail to me that the final circulated national result was 51 Kerry, 48 Bush. Nothing more precise. But the Simon/Baiman extrapolations are more conservative (50.8 Kerry, 48.2 Bush).
Also, why do the authors, like Freeman, insist on applying the 30% correction from the Merkle and Edelman citation? I’ve corresponded with Merkle and he wrote:
“What was in the Merkle and Edelman chapter is only a general estimate based on work at VNS in the early 1990s. The design effect will vary state by state based on the number of interviews per precinct and how clustered the variable is. More work was done on this in 2004 by Edison/Mitofsky. Edelman and I did participate in this. I would suggest using the design effects calculated by Edison/Mitofsky for their 2004 polls.”
So – you have the author of the study cited by both Simon/Baiman and Freeman saying that this is not appropriate for use with the 2004 exits.
So what is appropriate? Mitofsky wrote that the design effects varied from 1.5 to 1.8, depending on the average number of surveys per precinct.
Mitofsky wrote to me saying: “Both Merkle and Edelman participated in this latest calculation. Freeman would be making (another) mistake using 1.3.”
Dan Merkle told me that the average interviews per precinct is calculated considering only the intercept interviews and that the telephone interviews of early and absentee voters have a different, but smaller design effect.
This makes the standard error calculation a bit more complicated. However, the design effect square roots of 11,719 interviews in 250 polling stations yields 47 interviews per station, giving a design effect square root of 1.6 (or 60%). This is quite a bit higher than the 30% factor used in the Freeman and Simon/Baiman papers.
From the Simon/Baiman paper: “To carry our analysis further, we can employ a normal distribution curve (see Figure 1) to determine—again assuming proper poll methodology, no discriminatory voter suppression 24, and an accurate and honest popular vote count—“
What about non-sampling error? Why suppress the fact that there has been consistent democratic bias in previous exit polls (All previous presidential exit polls that I’ve read about)? Sure this year the bias was worse than other years, but the quote above excludes the possibility of differential non-response which evidence from previous exit polls have shown to be able to explain some portion of the observed Democratic bias.
Continued from the Simon/Baiman paper: “…that the probability that Kerry would have received his reported popular vote total of 48.1%, or less is one in 959,000—a virtual statistical impossibility.”
I don’t know about the 1:959,000 figure (no time to crunch numbers), but anyone who reads the NEP methods statement knows that the predicted MoE was +/-1%, so this isn’t news. The national exits were outside the margin and were significant. In fact they were well outside. This doesn’t take rigorous analysis.
Finally – from the Simon/Baiman paper: “The clear implication of our analysis is that neither chance nor random error is responsible for the significant incongruence of exit poll and tabulated vote results, and that we must look either to significant failings in the exit poll design and/or administration or to equally significant failings in the accuracy and/or fairness of the voting process itself to explain the results.”
I suspect that Even Mitofsky and Lenski wouldn’t argue that the nation exits can be explained by random chance or random error. However, they are looking at differential non-response as the leading theory for much of the variance.
Dinner time, but Simon/Baiman do a poor job of pre-emptively “discreting” the differential non-response hypothesis. This is my opinion anyway…
Rick, you certainly do seem to be working hard at being dismissive. Why is that?
Me, I’d like to see what the covariance is between percentage of GOP/DLC/DINO election officials from, say, 1980, and the loss in exit-poll accuracy.
Steve Freeman updates his exit poll discrepancy paper:
http://www.appliedresearch.us/sf/epdiscrep.htm
Leaked Mitofsky data:
http://www.scoop.co.nz/stories/pdfs/Mitofsky4zonedata/
mairead – data are data. If the data suggests that Bush stole the election or that someone in the NEP purposefully leaked faulty data in an attempt to throw the election to Kerry , then I want to know that as an American. However, the problems with the data in the public realm and the analysis of these data are numerous. It’s more of an academic contention than anything.
For example, I agree with the simon/baiman paper that the national exits were way outside the margin of error (haven’t done my own analysis to determine just how far outside, but given the spread was 3 pts and the MoE given by the NEP was =/- 1%, they were certainly WELL outside the MoE.)
What can explain this? Personally, from what I’ve read, differential non-response is the most likely culprit, but it could be a combination of sampling error, non-sampling error (e.g., differential non-response, Bush fraud, and/or NEP fraud). Those who keep abusing/misusing the literature and/or data only weaken their case.
If their point is that something went wrong with the exit polls, well, I don’t think Edison/Mitofsky will disagree with that. But using terrible papers and/or analysis is simply not what we expect of high caliber academics. That’s my beef with it all.
Freeman still stands by the use of his 30% correction. Unbelievable… Has he bothered to contact Merkle or Edelman? Obviously not.
Here is what Mitofsky wrote about this:
“Both Merkle and Edelman participated in this latest calculation. Freeman would be making (another) mistake using 1.3.”
Here is what Merkle wrote about this:
“What was in the Merkle and Edelman chapter is only a general estimate based on work at VNS in the early 1990s. The design effect will vary state by state based on the number of interviews per precinct and how clustered the variable is. More work was done on this in 2004 by Edison/Mitofsky. Edelman and I did participate in this. I would suggest using the design effects calculated by Edison/Mitofsky for their 2004 polls.”
I gave Freeman this information weeks ago. I know he read it because he responded that he wasn’t ignoring my e-mails. I urged him to contact Mitofksy and Merkle about the use of the 30% factor in his paper. Obviously he either did not, or excluded their statements.
This is supression of evidence, simple. It makes his case weaker and therefore he ignored it.
Failing to disclose these facts in this paper is to me a bit unethical. I’ll let readers decide. I’ll also be filing a formal complaint with his university.
Thanks for responding, Rick!
As I’ve mentioned elsewhere here, my training is in social psych. And that naturally makes psychosocial factors figural for me, as it does also for Steve Freeman.
When I look at Steve’s evolving paper, I’m unable to see in it any basis for your criticism (“But using terrible papers and/or analysis is simply not what we expect of high caliber academics. That’s my beef with it all.”).
Steve’s paper seems to me to be very conservative: he points out that it *could* be one or more of the factors you hypothesise here…but there’s no evidence.
You and Mark seem to be making a baseline assumption that the exit polls are wrong, and you’re offering a number of hypotheses about why that might be so. But you guys appear to characterise your hypotheses as though they are probative, which doesn’t seem like a good idea.
Mark, for example, mentions that the exit polls have been decreasingly accurate since 1990 or thereabouts. But in fact we don’t know they’re inaccurate–all we know is that they differ to the tallies. There’s no independent validation of the tallies; they’re simply assumed to be correct, which of course is really the question at issue!
On the other hand, we do have completely solid evidence, in part from NORC’s recount of Florida2K, that there was a lot of fraud of various kinds (I use the term in a general sense) and that the wrong person is in the WH now. We also know for sure that it was basically the same crew running things then that were running things this time, too.
Has anyone looked at the NORC tallies and compared them to the exit polls for Florida2K, with corrections for egregious ‘mistakes’ such as the thousands of Jewish votes shunted off to Buchanan? How do those exit polls pan out?
And, as I asked above, what about the covariance between the alleged drop in exit-poll accuracy and the changes in who controls the polling places?
Things like that could be good data, one way or the other, don’t you think? It seems to me that we could bear having more data.
Rick writes: “This is supression of evidence, simple. It makes his case weaker and therefore he ignored it.
Failing to disclose these facts in this paper is to me a bit unethical. I’ll let readers decide. I’ll also be filing a formal complaint with his university.”
erm, I’d gang warily with that ‘filing a formal complaint’ idea, Rick. As a grad student, it could take more time than you’d like to get the egg off your face.
I’m not clear about what you’re calling a ‘fact’, Rick. That the 30% fudge factor is wrong? That’s not a ‘fact’. That’s not even a heuristic, proven to have practical value over time. It’s a mere *opinion*. To be more than an opinion, it would need to be demonstrated to be true, not merely popular.
Merkle and Edelman worked with Mitofsky and Lenski to calculate the design effect for the 2004 exit polls. They calculated the effect to vary from 1.5 to 1.8 depending on the average number of intercept interviews per number of precincts sampled.
Freeman insists on using the 1.3 factor from the 1996 exit polls. Freeman knows that one of the author’s of the citation that calculated the 1.3 factor (national factor, not a state-by-state factor) says that it is not appropriate for use with the 2004 exit polls.
There is also the problem with the single-tail test. The null-hypothesis requires a two-tail test. Therefore his p-values are highly inflated (2X actually).
If you search the comments to other posts on this site you can find more about why this is a two-tail test, not a single-tail test.
Freeman probably also knows (I’ve given him the spreadsheet) that using the 1.5 to 1.8 factors according to the Mitofsky methods, means that the statistical significance of the OH, PA, and FL depends on the rounding prodedure (there are error bounds associated with his data due to rounding). Meaning, the data are VERY “fuzzy” and precise Z-scores and p-values, let alone compounding probability calculations also have error bounds. The problem is that the lower error bound of these data shows that neither FL, OH, or PA are singificant, whereas the upper error bound show that all three states could be significant (but in OH the significance is associated only with the Bush proportion, not the Kerry proportion).
If Freeman said “The exit polls in OH, FL, and PA, are really far off and I think something is up” then I would say to him – “join the crowd.”
But he insists on using flawed methodology and supresses evidence even though these things have been demonstrated to him. The “fuzziness” of the data should be OBVIOUS to him or anyone who has analyzed these data closely and therefore reporting findings without an error bound, or at least acknowledging that an error bound exists, is in my opinion in flagrant violation of academic standards and ethics. He cannot plead ignorance here. He didn’t even bother to try to address these issues.
Again – no one disagrees that the exit polls were way off and everyone wants to know why. The problem here is that the data we have in the public realm cannot justify the type of analysis conducted by Freeman. 1.3 is wrong. Rounding to a 10th is wrong. Use of a single-tail is wrong. And even if all of those things were right, it is still wrong to take look at three significant findings in a vacuum. There were 52 exit polls taken on election day. Therefore the probability (1:662,000) is NOT p*p*p. Even undergraduates with stats background know this. To not explain why he insists on using p^3 rather than taking the number of significant findings out of 52, is either sloppy or disengenous.
These are the reasons for the complaint. I don’t expect anything to come of this, but I just want the University to know what has been done on their letterhead. I will also urge those who have advised me in preparing my response to Freeman’s paper to contact Freeman and the University and also make their grievances known.
All Freeman had to do was contact Merkle and ask if the 1.3 factor could be used with the 2004 data. Merkle would tell him what he told me – NO! But this would seriously undermine Freeman’s paper so he supressed evidence.
Please note that I made some of the same mistakes as Freeman (and CalTech/MIT). But I’ve gone on the record and corrected my past mistakes.
http://stones-cry-out.blogspot.com/2004/11/what-went-wrong-with-exit-polling.html
As far as “NORC’s recount of Florida2K” I don’t know where these data are available and I’m not sure what you would like to know. I haven’t even seen the 2000 exit poll data, although I hear it is available via the Roper Center. Perhaps you could refine your hypothesis or research question and someone with access to the Roper Center data would take on the analysis.
“Freeman insists on using the 1.3 factor from the 1996 exit polls. Freeman knows that one of the author’s of the citation that calculated the 1.3 factor (national factor, not a state-by-state factor) says that it is not appropriate for use with the 2004 exit polls.”
You’re doing it again, Rick. So what if Mitofsky et al. ‘calculated’ it? In engineering when you make up a number, it’s called a ‘WAG’–a ‘wild-ass guess’. If you use a computer to make up the number, it’s a SWAG – a *scientific* wild-ass guess. But it’s still only a guess. It has no a-priori empirical validity. Yet you talk about it as though it were a well-understood, accepted number with a lot of predictive power. I don’t understand that.
If we look at that fudge factor in a slightly different light, part of it becomes an estimate of the amount of fraud in the system. How difficult would it be to derive such a number? I’d think it would be quite difficult, particularly if nobody is thinking about it in those terms.
Mairead, I’ve posted a critique of the Simon/Baiman paper. If you read this, you will see why things like the 1.3 vs. 1.5, vs. 1.8 vs. ???? matters. Freeman makes a conclusion that is based on his probability calc (662,000:1). This probability calc is HIGHLY sensitive to a number of factors, one of which is the design effect.
Again, his broad conclusions may be correct, but the probability calculations that are the base of these conclusions are subject to very large error.
My question isn’t ‘why is the number important’, Rick, it’s *how do we know the number is correct?*
Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct! Okay, what does that mean? It means the number was not like the number you can look up to determine the sample size you need to get a confidence interval of some size n. It was not a well-defined number. It was a guess.
But these people are experienced, so why was it so hard to determine that number?
One strong possibility–we have evidence–is that it’s not really one number, it’s two–but the second one, the fraud factor, is being ignored. So Mitofsky et al. are trying to guess what this conflated number is without taking all causal factors into account.
So Freeman (I’ll ignore the Simon & Baiman paper as a red herring at this point, if that’s okay) is using some fudge factor (30%) that he believes represents the factors that contribute to legitimate uncertainty.
And he concludes that there’s also some other as-yet-unidentified factor influencing the outcome. That looks like good science to me.
I really think you have to offer more than unsupported hypotheses about the cause of the unpredictability before you can impeach Freeman’s choice of 30%.
(ignore this…i’m just trying to see whether html works)
My question isn’t ‘why is the number important’, Rick, it’s *how do we know the number is correct?*
Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct! Okay, what does that mean? It means the number was not like the number you can look up to determine the sample size you need to get a confidence interval of some size n. It was not a well-defined number. It was a guess.
But these people are experienced, so why was it so hard to determine that number?
One strong possibility–we have evidence–is that it’s not really one number, it’s two–but the second one, the fraud factor, is being ignored. So Mitofsky et al. are trying to guess what this conflated number is without taking all causal factors into account.
So Freeman (I’ll ignore the Simon & Baiman paper as a red herring at this point, if that’s okay) is using some fudge factor (30%) that he believes represents the factors that contribute to legitimate uncertainty.
And he concludes that there’s also some other as-yet-unidentified factor influencing the outcome. That looks like good science to me.
I really think you have to offer more than unsupported hypotheses about the cause of the unpredictability before you can impeach Freeman’s choice of 30%.
(ignore this…i’m just trying to see whether html works)
(I apologise for these hiccup posts…there’s some incompatibility between the blog and my usual browser (opera 6.5). I’ll try to remember to use this one (opera 7) from now on instead.)
“it is still wrong to take look at three significant findings in a vacuum. There were 52 exit polls taken on election day. Therefore the probability (1:662,000) is NOT p*p*p. Even undergraduates with stats background know this. To not explain why he insists on using p^3 rather than taking the number of significant findings out of 52, is either sloppy or disengenous.”
I meant to comment on this, but it slipped my mind.
Are you sure in your objection here you’re not confusing what Steve did with the impermissibility of cherry-picking some subset of numbers *after the fact*?
Steve motivated his choice of those numbers: they were states *identified beforehand* as significant. So it seems to me that they form a population–exactly as though they had been the only ones examined. Which, if I correctly recall my maths lessons, makes p*p*p the right way to calculate overall probablility.
No?
There are MANY problems with the Freeman analysis.
“Obviously, since Mitofsky et al. didn’t get the exit polls right, the number they used was not correct!”
The 30% fudge factor only applies to sampling error. Exit polls have been subject plagued by non-sampling error throughout their history. The non-sampling error cannot be predicted. If it could, they would design it out of the survey.
I don’t want to get into an argument with you when it is obvious that you have not read Mystery Pollster’s posts on exit polls. He answers many of your questions.
If you want a more detailed look at problems with the Simon/Freeman data and methods (the problems are virtually identical) you can read the post on my blog.
Take the 30% – who cares. The data in the public realm are too fuzzy for precise analysis. That is what my post proves. The very fact that the data are lacking a significant digit adds tremendous variability to the probabilities associated with any calculated discrepancy.
“Are you sure in your objection here you’re not confusing what Steve did with the impermissibility of cherry-picking some subset of numbers *after the fact*?”
That is ONE of the many problems with the paper. He “assumes” no bias in the survey or the election result, but still uses a single-tail test. This violates his assumption. Simon/Baiman commit this error as well. It’s in my post.
Now what if I can demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant? (I can).
I will be posting a critique of the Freeman paper similar to my critique of the Simon/Baiman paper within the week. You can read this critique and judge for yourself.
What is the point really? That there is a “significant” discrepancy? So what? Who disagrees here? Certainly not Mitofsky or Lenski. What I (and others) disagree with are the terrible methods used for “scientifically” calculated probabilities. This part is a joke. The conclusion that the discrepancy is significant is not news. If they want to write a paper putting forth their hypotheses for explaining the discrepancy, no problem. But they should clean up their terrible and unsupportable statistics first. Be honest with the limits of the data and make a case for why better data should become available. (Also be honest about the literature review!)
“Now what if I can demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant? (I can).”
I don’t really have time right at the moment to respond in detail, but I’m sure someone else will pounce on this if I don’t, so…
When you’re using numbers and assumptions that have no empirical validity then sure, you can demonstrate fairly much whatever you like, Rick–you’re creating a world of hypothesis! But that doesn’t mean that what you demonstrate within that hypothesised world has any applicability to *this* world.
I’ll continue later, unless someone else beats me to it.
“But that doesn’t mean that what you demonstrate within that hypothesised world has any applicability to *this* world.”
My point exactly! Send that to Freeman and Simon/Baiman.
When I demonstrate that depending on the rounding method, the design effect, and type of test, all three states can either be significant or not significant, I will be doing so not to reject (or confirm) any null hypothesis. Only to demonstrate the “fuzziness” of the data and the implications of this fuzziness on reaching a single probability. (Other probs as well, but this works for this discussion).
That is, using the fuzzy data available to calculate a precise probability without acknowledging the fact that the probability has an error bound and without estimating this error bound is absurd.
Remeber, the null they rejected is that the discrepancy occurred by chance. I’m not saying anything about the validity of their hypotheses attempting to explain what happened. This is a separate question. In fact, they could put forth their discussion of possible explanations for the discrepancies sans the statistical analysis. If they did this, or at least admitted that there are error bounds associated with their probability calcs, and they fairly represented the literature, then I wouldn’t have a problem. They are free to speculate/argue for/against any explanatory hypothesis that they wish. That’s not my beef.
Have you bothered to read my post? Why not take this conversation into the comment section on my post? If MP cared about any of this debate, he may have jumped in by now.