The release of several new papers on the 2004 exit poll controversy brings me back to this familiar topic. The first paper, from a team of academics with considerable survey expertise, breaks no new ground but provides a good overall summary of the controversy. The second, by frequent Mystery Pollster commenter Rick Brady, goes further, taking on those whose widely circulated Internet postings proclaim evidence of fraud in the exit polls. The academic paper is an excellent overall primer on the issue, but Brady’s work breaks new ground.
The first, a "working paper" released on March 11 by the Social Science Research Council (SSRC) of the National Research Commission on Elections and Voting, is noteworthy for the expertise of its authors. Michael Traugott of the University of Michigan, Benjamin Highton of the University of California (Davis) and Henry Brady of the University of California (Berkeley) are political scientists with scores of journal articles on voting behavior and survey methodology to their names. Traugott, the principal author, is a past president of the American Association for Public Opinion Research (AAPOR) and the co-author or editor (with Paul Lavrakas) of several books on survey methodology, including the very accessible Voters Guide to Election Polls (which MP would include on a list of recommended books, if he ever got around to putting such a list together. Full disclosure: twenty years ago, Traugott was the third reader on MP’s undergraduate honors thesis).
In short, these guys know what they are talking about.
Yet for all the academic firepower that Traugott and his colleagues bring to the exit poll debate, they break little new ground. They do present a balanced and thorough summary of the short history of the controversy and its key issues and include the most complete bibliography on the issue (including URL links) MP has seen to date. If nothing else, the Traugott paper is an excellent starting point for anyone grappling with this issue for the first time.
Traugott and his colleagues also make a very important point about the key issue that continues to frustrate those seeking an "explanation" from the exit pollsters for the discrepancy between the exit polls and the final results: When it comes to "nonrespondents" — those who refuse to participate in a survey — "proof" is inherently elusive. In reviewing the report from the National Election Pool (NEP) released earlier this year, they write:
[The report] is complicated in a way that many post-survey evaluations are by the fact that some information is essentially unknowable. This is especially true when one of the concerns is nonresponse, and there is no information from the nonrespondents to analyze. As a result, there are some sections of the report in which there is an extremely detailed level of disclosure about what the exit poll data show, but in other parts of the report there are only hypotheses about what might have been the cause for a particular observation. These hypotheses can guide future experiments in exit polling methodology or even direct changes in the methods, but they cannot explain in a strict causal sense what happened in the 2004 data collection (emphasis added, pp. 8-9).
A second paper, posted over the weekend by our friend Rick Brady of the blog Stones Cry Out, is a point-for-point rebuttal of the final version of Stephen Freeman’s well known paper, The Unexplained Exit Poll Discrepancy (MP reviewed the first version of the paper back in November). Brady, who has been studying graduate level Statistics on the way to a Master’s Degree in Public Planning, assails every statistical weakness in Freeman’s thesis. Many of the issues that Brady raises will be familiar to MP’s readers, but he does an excellent job putting it all together and raising some statistical issues not included in the Traugott paper.
For MP, the most interesting aspect of Brady’s review is his discussion of a subsequent paper by a team of PhDs (including Freeman) affiliated with the organization US Count Votes. Kathy Dopp, the President of US Count Votes (USCV), issued a public challenge "for any PhD level credentialled (sic) statistician who is affiliated with any university in America to find any statements in our ‘Response to Edison/Mitofsky Report’ that they believe are incorrect and publicly refute it."
Brady may be just a Master’s Degree candidate, but he steps up to the challenge, essentially picking up where the Traugott paper leaves off. He observes:
The US Count Votes authors conclude that only one of two hypotheses are worthy of exploration: 1) the exit polls were subject to a consistent bias of unknown origin; or 2) the official vote count was corrupted. The question then becomes; did the NEP Report substantiate the first hypothesis? [p. 12]
Reviewing the NEP report, Brady concludes:
Given the number of NEP Report conclusions that included qualifiers such as "likely," "may," and "could," I understand how US Count Votes is concerned with the analysis. In effect, the NEP Report never (from what I can tell) rejected the null hypothesis in a classical sense. However, the contention that "[no] data in the report supports the hypothesis that Kerry voters were more likely than Bush voters to cooperate with pollsters" is not in the least bit accurate. The NEP Report presented volumes of information that most analysts agree "suggests" support for the hypothesis that differential non-response was the cause of the observed bias in the exit polls [pp. 13-14, emphasis added].
The paper has much more. Brady has been a loyal FOMP (Friend of Mystery Pollster), so I may be accused of some bias on this score. Yet I hope other prominent observers will agree: Brady’s paper is a must read for those still genuinely weighing the arguments on the exit poll controversy.
Wow Mark, I’m Blushing
Thanks for the glowing review! For regular SCO readers who are not familiar with Mystery Pollster, it is a must-read blog for anyone interested in public opinion-related matters. Mark is currently taking a poll of his readers asking whether he…
Thank you for directing our attention on Mr Brady’s paper. However I would have liked you to mention/comment this paragraph from his paper: “Nevertheless, Dr. Freeman is right in concluding that explanations of the discrepancy to date are inadequate and Edison/Mitofsky should address the concerns of US Count Votes in subsequent analysis of their data”. Better still, you could give us your own opinion on the US Count Votes paper.
dgr, since you mention a quote from my paper, I thought I’d respond.
The NEP Report suggests that differential non-response explains the bias in the exits. Basically, Bush voters refused to participate or were missed at a higher rate than Kerry voters.
As I describe in my paper, the US Count Votes paper raises two substantive issues with the NEP Report: 1) The data from the NEP Report does not support the hypothesis of differential non-response, but in fact suggests that Bush voters responded at a higher rate than Kerry supporters; and 2) the WPE by vote equipment data suggest another possibility for the explanation (vote fraud).
On the first US Count Votes charge, the NEP Report clearly states that WPE by precinct partisanship is not significant. My point is that if it is not significant, releasing more information about the test(s) and results would be useful. I happen to trust EMR/Mitofsky’s professionalism and when they say the relationship is not significant, they must have tested the data for significance.
One charge two, WPE by vote equipment, the US Count Votes paper only presents part of the story. When the data are aggregated differently (rural v. urban), the apparent relationship vanishes. Simpson’s Paradox at work. However, US Count Votes is right in asking for an ANOVA test of these data and if the differences are significant, either in the aggregate, or by rural/urban (or other precinct characteristics), then I’d like to know where the signifcance occurred (simple Tukey’s HSD would suffice).
I guess my point with my quote is that since the NEP Report did not reject a null hypothesis by any classical means of hypothesis testing, but instead provided a whole lot of circumstantial evidence, they should be more specific about which tests were applied involving which variables. A technical appendix explaining methods, tests, and with reported findings (SPSS out put tables as one example) would have been nice.
For the record, from what I’ve seen to date, I have no reason to not trust the circumstantial case built by EMR/Mitofsky. Look at all the suggestive evidence they provide. Sure it’s not “proven” as in a null hypothesis was rejected at some confidence level, but it sure is suggestive. It all makes sense to me, but I understand how others have problems with the report. The point of my statement quoted by you above is that I think EMR/Mitofsky should report more details about the non-significant finding of WPE by precinct partisanship and run the ANOVA in response to the US Count Votes current criticisms.
I hear that US Count Votes will have a second more thorough criticism of the NEP Report shortly. I will be interested in reading that report.
But, this is all a distraction from the point of my paper. I think Dr. Freeman should address the questions with his work. Apparently though that will not be happening as I received word from US Count Votes’ Kathy Dopp that they won’t respond to a non-PhD. They told me to have a PhD co-sign my paper and they would respond appropriately.
The implicit point of my paper is that end does not justify the means with science. Just because I agree with Dr. Freeman’s 3rd conclusion that the exit poll discrepancy is not fully explained, does not justify his treatment of the literature and data, which was the heart of his paper (Recall the shot heard round the world – 662,000:1 – IMPOSSIBLE!!!).
His arguments should stand or fall on the logic of his presentation and the validity of the science employed. Even though I feel his Unexplained Exit Poll Discrepancy paper is highly flawed, I won’t let my opinion of this paper color my review of his forthcoming work on exit polls (He has a book and possibly two journal articles forthcoming). Most reputable journals will consider papers for publication without knowing the name or credentials of the author. I’ll try my best to be a blind referee when considering his future work, or future work from US Count Votes. I believe that all researchers deserve that much respect.
I am and should be subject to these same rules of science. If I’ve fudged something in my analysis, I can assure it was not on purpose. Please point it out to me and I’ll consider a revision. In fact, I am grateful to MP for providing this forum. I would be happy to respond to questions or criticism of my work in comments to this post. Could prove to be an interesting experiment in the role of “peer-review” in the new media MP’s smart readers serving as referees.
I’m glad to finally see a reasonable balanced response to Freeman’s work. I’m looking forward to seeing Freeman’s reply.
There is one thing that must be kept in mind, however. As long as there is no other acceptable explanation for the exit poll disparity, we must go for the explanation that has the most evidence. And that explanation is poll fraud. True, we can’t actually prove irrefutably that there was enough poll fraud to swing the election. But when you see a hand hanging out of the back of a car trunk, do you need any more evidence to assume that there is a body in the trunk? If our courts and our newspapers were doing their job, who knows how much evidence would have been found?
I’m glad to finally see a reasonable balanced response to Freeman’s work. I’m looking forward to seeing Freeman’s reply.
There is one thing that must be kept in mind, however. As long as there is no other acceptable explanation for the exit poll disparity, we must go for the explanation that has the most evidence. And that explanation is poll fraud. True, we can’t actually prove irrefutably that there was enough poll fraud to swing the election. But when you see a hand hanging out of the back of a car trunk, do you need any more evidence to assume that there is a body in the trunk? If our courts and our newspapers were doing their job, who knows how much evidence would have been found?
I’m glad to finally see a reasonable balanced response to Freeman’s work. I’m looking forward to seeing Freeman’s reply.
There is one thing that must be kept in mind, however. As long as there is no other acceptable explanation for the exit poll disparity, we must go for the explanation that has the most evidence. And that explanation is poll fraud. True, we can’t actually prove irrefutably that there was enough poll fraud to swing the election. But when you see a hand hanging out of the back of a car trunk, do you need any more evidence to assume that there is a body in the trunk? If our courts and our newspapers were doing their job, who knows how much evidence would have been found?
US Count Votes to Release Study
US Count Votes will release a “Scientific Study on Exit Polls” tomorrow that they say will highlight National Election Pool (NEP) data, which suggest that election fraud may be the best explanation for the 2004 exit poll discrepancy. The study…
US Count Votes to Release Study
US Count Votes will release a “Scientific Study on Exit Polls” tomorrow that they say will highlight National Election Pool (NEP) data, which suggest that election fraud may be the best explanation for the 2004 exit poll discrepancy. The study…
Rick,
Excellent analysis of Freeman’s paper. Your analysis of the US Count Votes paper, however, was lacking.
You start with a critique of their quote “No data in the
report supports the hypothesis that Kerry voters were more likely than Bush voters to
cooperate with pollsters, and the data suggests the opposite may have been true,”
However, from the current pdf of the US Count Votes paper, that quote does not exist. They simply say that the “data does not support their theory.” Your version of the quotation is much more extreme, and thus, easily disputed as you ensued to do.
Continuing, you say that the authors assert that if the data presented in Figure 5 (completion rate by precinct partisanship) were significantly correlated, then response rates in Bush precincts were slightly higher than those in Kerry precincts. However, you misrepresent the authors again, here. The authors do not mention significant correlation at all. They restrict their assertions to the simple fact that NEF’s table showed that response rate in the most partisan Bush precincts was higher than the response rate in the most partisan Kerry precinct.
This statement does not require that this difference is significant. In fact, only NEF can test that significance, which they did not. They tested all completion rates. Granted, a post-hoc test of difference between 2 of the 5 categories specified is very unlikely to reveal significant difference. However, the USCV claims do not require significance. The simple fact is that response rates in the heaviest Bush precincts was .03 higher than the response rate in the heaviest Kerry precincts. The USCP concludes “this fact undermines the [NEP] report’s
central premise that Kerry supporters were more likely than Bush supporters to participate in the exit poll.” They are correct. Despite the lack of significance, those two data points do _undermine_ the theory/hypothesis in question. This claim does not require them to “infer professional incompetence or outright deception from the NEP Report authors.”
Obviously, all this would be cleared up if the NEP Report did release more detailed tests if not the original data. However, the key point of that section of the USCP paper is that the polling data released in the NEP paper does not support the hypothesis, and thereby undermine the hypothesis. If they hypothesis were true, you would expect larger response rates in the most partisan Kerry precincts than in the most partisan Bush precincts, and you do not. In fact, the released data have the opposite pattern. There is no inappropriateness in the way USCV presents this data or in their conclusions.
Further along, you repeat the original quote, this type with an editorial bracketed “[No]” as part of the quote. Regardless of editorial bracketing, the quote does not exist in the current pdf, and the addition of the editorial “no,” changes the meaning entirely from that expressed in the current pdf.
Un-addressed in your article is that USVC continue to address two more versions of the differential non-response hypothesis and show that the released data contra-indicate both versions, and that they further address they hypothesis in extraordinary detail that provides very convincing arguments against this hypothesis. Their mathematical model of voter differential non-response in part “C,” breaks new ground in this whole issue, and very specifically addresses the issue of the plausibility of the differential non-response theory. I agree with USVC that this renders that theory decisively “implausible.”
I found it!
You minorly mis-quoted and mis-cited, but did not misrepresent in the first quote. The quote is on P. 8, rather than P. 3-4, as you cited in your footnotes, and the complete quote is the following, “No data in the E/M report supports the hypothesis that Kerry voters were more likely than Bush
voters to cooperate with pollsters and, in fact, the data provided by E/M suggests that the opposite may have been true.”
I agree with your assessment that the phrasing of that sentence is extreme. However, in the rest of the article, they do not phrase their conclusions as extremely, and in fact, as I quoted above, phrase it much softer. Considering that is how they summarize their own views, I stand by my assertion that you misrepresent their position by pulling that one quote. You are correct to criticize that quote as extreme, but incorrect to selectively pull that quote as a representation of their position.
Thanks Arvin. I’ll take a look at this after the kids are in bed. 🙂
Arvin, I’m looking at a hard copy of the US Count Votes 3/13/05 report and I called up the URL cited in my paper: http://www.uscountvotes.org/ucvAnalysis/US/USCountVotes_Re_Mitofsky-Edison.pdf#search='NEP%20Exit%20Poll%20Report%20US%20Count%20Votes
I have accurately quoted the 3/13/05 US Count Votes study. I believe you are looking at a copy of the 3/31/05 US Count Votes paper, which includes the quote on page 8 that you cite above. I’d appreciate it if you would update your several comments on DailyKOS claiming that I’ve misquoted the US Count Votes study accordingly. Thanks.
Regarding your other points.
First, you wrote: “Continuing, you say that the authors assert that if the data presented in Figure 5 (completion rate by precinct partisanship) were significantly correlated, then response rates in Bush precincts were slightly higher than those in Kerry precincts. However, you misrepresent the authors again, here. The authors do not mention significant correlation at all. They restrict their assertions to the simple fact that NEF’s table showed that response rate in the most partisan Bush precincts was higher than the response rate in the most partisan Kerry precinct.”
The NEP Report says the relationship between these two variables is not significant. As of today, I have assumed that when the NEP Report says that a relationship is not significant, it is not statistically significant (.05). If a relationship is not significant, then to say that “response rate in the most partisan Bush precincts was higher than the response rate in the most partisan Kerry precinct” is not correct.
Assuming the variables were correlated, the data (both interval) would produce a regression line. Completion rate would be the Y, with precinct partisanship the X. The resulting regression line would have a confidence interval. If a zero slope (aka “b”, or regression coefficient) is possible, given the confidence interval, then the relationship is not significant, even though it “appears” that there is a relationship.
If the line is significant, it can be used for predictive purposes. That is, one can say that for every X percent increase in precinct partisanship, completion rate increases by Y percent. This is what the US Count Votes are suggesting. They are suggesting that as a precinct is more populated with registered Republicans (X), completion rates increase (Y). But, if the regression line is not significant(.05), then it cannot be said with 95% confidence that change in X correlates with change in Y.
Another way to look at these data would be to recode the variable “precinct partisanship” from interval, into 5 ordinal categories, as is done in the NEP Report. ANOVA is a good test of significance for data aggregated in this fashion where deriving a mean for a group within the independent variable is is possible.
You wrote: “The simple fact is that response rates in the heaviest Bush precincts was .03 higher than the response rate in the heaviest Kerry precincts. The USCP concludes “this fact undermines the [NEP] report’s central premise that Kerry supporters were more likely than Bush supporters to participate in the exit poll.” They are correct.”
No, they are not correct if the relationship is not significant. The data do not show that “response rates in the heaviest Bush precincts was .03 higher than the response rate in the heaviest Kerry precincts” as you say. It shows that the MEAN response rate APPEARS from the table to be higher in Bush strongholds than the MEAN response rate in the Kerry strongholds. As I’ve said, if there is no “Honestly Significant Difference” between the categories of the data (Tukey’s HSD is one test of this), then the US Count Votes chart is meaningless.
You wrote: “This claim does not require them to “infer professional incompetence or outright deception from the NEP Report authors.””
Well, again, I’m assuming that when the NEP Report says that a relationship is not significant, they have tested for significance. They may not have. So, unless the US Count Votes are questioning the NEP finding of non-significance, they are wrong in “suggesting” a positive relationship. That said, isn’t it curious why the US Count Votes paper completely ignored the NEP statement of non-significance? That finding must be a bit inconvenient to their point.
You wrote: “Obviously, all this would be cleared up if the NEP Report did release more detailed tests if not the original data.” Agreed about the additional tests… That was the conclusion of my paper, and I think I made that point clear in my first comment in this thread above. Although, I don’t think that providing the actual raw data is a good idea.
I wrote the following in response to a comment on my blog from a PhD who was trying to identify the exact precincts polled: If you have not read the recent paper by doctors Traugott (UMich), Highton (UC Davis), and Brady (Berkeley), I encourage you to do so. Here’s a quote from their paper: “The information on the exit poll methodology is still being consumed by independent analysts, and there are now calls for the release of raw and supplementary data from sample precincts. This would include contextual data about the vote history in those areas as well as information about the interviewers. This is unlikely to happen, and for justifiable reasons. Such information would be too politically sensitive in that disclosure of the sample sites could subject the exit poll interviewing to manipulation by political organizations and interest groups on Election Day if the same sites are always chosen (p. 13).”
I think that pushing for more rigorous testing of the data with release of the data outputs is the best course of action. Calling for the data itself will not likely win friends within E/M or the NEP (and it certainly won’t get the data).
I’ve given the 3/31/05 US Count Votes report a quick overview. I am not in a position to comment on that study yet. I stand by my paper’s criticism (and agreement) of the 3/13/05 US Count Votes study.
Rick,
Ahh, … we are working off different copies of the article. I am using the finished 27 page article. They did indeed change the relevant text, omitting the line “data … is not analyzed or mentioned in the text” which you criticize them for. And you do not misquote or miscite the authors. The quotation does exist on Pp. 3-4 of the preliminary article.
I concur with your criticism that the quote is too extreme. They should not say “no data … supports the hypothesis,” but rather “data in the report do not support the hypothesis.” Likewise, I agree that E&M does mention the data in the line that you cite, and apparently USCV agrees since they omitted the line in their final version.
However, you do misrepresent their position in two ways, and by doing so, construct two straw men which you proceed to tear down. Firstly, you attribute to the authors an extreme statement about significance of the data which the authors do not make, and thereby criticize them by it. Secondly, and more importantly, you misunderstand their argument, and criticize them for a stance they do not take. They assert that the data and graph in question do not support E&M’s hypothesis, and you criticize them for failing to show that the data and graph in question support the opposite hypothesis.
You criticize that ‘If a relationship is not significant, then to say that “response rate in the most partisan Bush precincts was higher than the response rate in the most partisan Kerry precinct” is not correct.’ I beg to differ. It would be incorrect to say that the response rate was _significantly_ higher. Yet, the author’s do not say that. The response rate is _clearly_ higher. Whether the difference is significant is a different matter. Obviously, the issue is exacerbated by the fact that none of us has access to the raw numbers and can test significance. [More on this in the next post] Your original article clearly states that the authors make a statement about the significance of the data suggesting a conclusion(P. 13). The authors of USCV never make any such statement. The only mention of significance in the entire preliminary article is further along, lamenting the lack of release of WPE data. In fact, they go out of their way to avoid a strong statement along these lines precisely because they do not have access to the raw data and thus have no ability to draw significant conclusions.
The misquote above illustrates this, they conclude “the data _suggests_ that the opposite may have been true.” Note the wording here. The authors are saying that the graph in question does not support the E&M hypothesis of -6.5% WPE to Kerry. It clearly does not. If it did, we would see a pattern in the opposite direction, with a negative overall slope. The fact that the slope of the graph is positive would tend to indicate that Bush supporters responded at higher rates than Kerry supporters. However, we can’t evaluate that hypothesis either without further data.
You seem to be trying to say that E&M’s claim of no significant difference between completion rates _supports_ their “differential non-response” hypothesis. It does not. In fact, E&M’s claim _undermines_ their own hypothesis. Arguing for a slope of 0 in a regression line, as you do, is arguing against E&M’s hypothesis that Kerry supporters respond in greater percentages than Bush supporters. If their hypothesis were correct, we would see a negative slope, with increasing response rates as the percentage of Kerry supporters increased. We clearly do not see that. You criticize US CV for the “inappropriateness,” of ignoring E&M’s claim of non-significance. However, non-significance argues against E&M, not USCV.
Without the data, none of us can reject or accept E&M’s hypothesis at any level of confidence. Yet, without the data, we can _look_ at the data, and use qualitative words such as suggest the opposite and undermine, which is exactly what USCV does.
On Significance Mean Rates and Samples, Significance Tests, and Data Release
I wanted to append a few comments to your reply that were tangential to the core argument above.
Specifically, you make a good point about Mean vs. actual. In fact, my statement would have been better phrased as “sampled mean response rate in the heaviest Bush precincts was .03 higher than the sampled mean response rate in the heaviest Kerry precincts.” In statistics, we often get lazy and confuse sample means with population means. The former is measured, or in this case, polled. The latter is what we use confidence intervals, regressions, etc. to get at. Although the simplest estimator for the population mean would be the sample mean, in this case, considering the variability within the 5 ordinal intervals that E&M released, some minor regression to the grand mean would be valid in estimating the population mean.
On significance tests, as you know, there are many different types of significance tests, each dependent on what assumptions you make about the data, and what hypotheses you wish to test. Even with the same hypothesis, data not significant in one test(eg. t-test) can be shown to be significant in another test with greater power(eg. paired t-test). Nevertheless, I wanted to draw out some details concerning the significance tests in question.
Post-hoc significance tests are fraught with difficulties. Significance p-values cannot be assumed to be .05, and p-values most likely have to be much lower for valid significance. Pre-hoc tests do not have this concern, as the researcher has not yet seen the data. Certain post-hoc tests, such as a simple ANOVA can be assumed to be pre-hoc inasmuch as it tests a standard and valid pre-hoc hypothesis.
Let’s assume we’d like to show that Kerry supporters responded (rK%) in greater percentages than Bush supporters(rB%), regardless of by how much. We can assume the null hypothesis: rK% =< rB%, and the alternative rK% > rB%. We can test on E&M’s table of completion percentage by precinct partisanship. Without doing the significance tests, I can pretty much guarantee you that the null hypothesis cannot be rejected. Using this data, E&M are unable to “support” their hypothesis, or reject the null in favor of the alternative.
On the other hand, let’s assume we’d like to show that E&M’s hypothesis is incorrect. In this case, we’d setup the null hypothesis as rK% => rB% + 6.5%. The alternative would be rK% < rB% + 6.5%. This test almost certainly was not performed by E&M, and without the actual data, we can't perform this test. However, there are a number of ways to perform this test. We _could_ do a simple regression, but that would require interval data which only E&M have access to. We could also do a much less accurate regression on the ordinal data, using .1, .3, .5, .7, .9 as the precinct partisanship for each ordinal group. Unfortunately, these guesses are likely to be incorrect, as the real precinct partisanship values for each group are likely skewed towards .50. Alternatively, we could cherry pick 2 out of the 5 intervals, and do a significance test on those two, the first and last intervals, .53 and .56. Obviously, when doing so, you _must_ use corrected post-hoc significance levels, which in this case would be very low. My guess is that one of these tests may actually reject the null hypothesis, but it's far from a certain question without the real data. I would like to see any of these significance tests here performed, but we are unlikely to see them done. In the absence of these significance tests that would outright reject E&M's "differential non-response" theory, the authors of USCV used mathematical modeling and the breakdown of WPE by precinct partisanship to show how implausibly the "differential non-response" voters would have to act in order for the theory to be true. MP, or someone else more familiar with historical polling behavior trends, can better evaluate their conclusions of implausibility. Without their opinion, I'm inclined to take the US CV authors at face value that E&M's assumptions require voting patterns "totally at odds with empirical experience." Concerning the release of raw data. I agree that too much raw data release is a cause for concern, possibly with voter identification, but more probably because of concerns of future vote manipulation. However, would it be possible to release the raw data omitting information that can be used to positively identify an individual, a precinct, or a sample site? On an unrelated note, your unexplained.pdf has problems with the Apple Preview in Mac OS X. I had to load Adobe Reader in order to see the fonts displayed correctly. No other pdf's I've encountered have ever had this problem.
Arvin, I don’t think you read my comment, or perhaps I did not write in a way that you understand. I’ve assumed that you had some statistics background; maybe, you do not. That’s okay though, I just need to explain it better.
You wrote: “Firstly, you attribute to the authors an extreme statement about significance of the data which the authors do not make, and thereby criticize them by it.”
No. They do make the statement. It’s in the 3/13/05 study, which is DIFFERENT than the 3/31/05 study. Not a different version of the same study. My paper was written before the 3/31/05 version and was critical of the 3/13/05 study. To date, I have said nothing about the 3/31/05 study. Not sure what it is here that you are saying.
You wrote: “Secondly, and more importantly, you misunderstand their argument, and criticize them for a stance they do not take. They assert that the data and graph in question do not support E&M’s hypothesis, and you criticize them for failing to show that the data and graph in question support the opposite hypothesis.”
Simply not true. Actually, they assert a possible positive relationship. If a relationship is not significant, it is not significant. Simple. The null hypothesis in this case is that there is no significant relationship between precinct partisanship and completion rates. If testing of these variables revealed a significant relationship (<.05), then the null can be rejected with some degree of confidence (95% is standard). The NEP Report doesn't confirm this null, it simply does not reject it. That means, that the data does not support, nor run counter to, their rBr hypothesis. It is simply neutral. For the US Count Votes crowd to suggest a positive relationship between these two variables by "eyeing" the data, even though the NEP Report says that there is no relationship, is a mistake that one would expect from students, not professors. Again, I'm assuming that the variables were tested for significance by E/M and the correlation was not significant. If E/M came to their conclusion that the relationship was not significant from "eyeing" the data in their table, that is something completely different. Also, this is only a bivariate analysis. The latest US Count Votes study (3/31/05) analyzes the interaction between multiple variables. I am looking at that now. They may be on to something when they throw WPE by precinct partisanship into the mix. One variable missing from the 3/31/05 study is geography. Where are the Bush strongholds? I assume that they are in rural areas. What was the WPE in these precincts? As noted on page 39 of the NEP Report (#8 - Size of place), the WPE declines as population declines. That suggests to me that in the Kerry strongholds, WPE was higher (-7.9 to -8.5). But in the Bush strongholds, WPE was lower (-3.6 to -4.9). This certainly doesn't "prove" anything. But, why did US Count Votes not consider this information? Again, I suggest that it is not convenient to their point. Just as the NEP Report statement that there is no significant relationship between precinct partisanship and completion rates.
You wrote: “No. They do make the statement. It’s in the 3/13/05 study”
Where? I didn’t find any statement by them to this effect in the 3/13 study. They do not address the issue of significance, only you do. They only make a simple qualitative observation about the data.
You wrote: “The NEP Report doesn’t confirm this null, it simply does not reject it. That means, that the data does not support, nor run counter to, their rBr hypothesis.”
I fully agree. The data does not support their rBr hypothesis. That is the full point of the USCV paper.
You wrote: “For the US Count Votes crowd to suggest a positive relationship between these two variables by “eyeing” the data, even though the NEP Report says that there is no relationship, is a mistake that one would expect from students, not professors.”
Utter bullshit. They said “the data suggests that the opposite may have been true.” The key word here is the soft and fuzzy term *suggest.* If there were a significant relationship, be assured that US Count Votes would be using much stronger language than *suggest.* The authors of US CV understand fully that they are unable to do a significance test to prove that Bush voters responded *significantly more* than Kerry reporters. They make qualitative observations only. You are misinterpreting their qualitative observations for statistical tests.
They’re basically saying, “If E&M hypothesis is true, the data should have more responders in heavily Kerry precincts. E&M data show the opposite.” Numbers have meaning even if they are not significant. The slope of the regression line for completion percentage to precinct proportion for the ordinal data is going to be positive. That much can be concluded from the E&M data. One may not be able to say with 95 % confidence that the slope is greater than zero, but one can still say that the slope is positive. This is not a mistake a “student would make.” Academic papers all the time make qualitative observations about data. Most of the time it is for speculative purposes. The US Count Votes paper is doing the same, and it is *especially understandable* since they have no means to perform a significance test.
I don’t understand why you put so much stock in the finding by the NEP report of not-significant. A not significant in any type of standard ANOVA or regression slope=0 test does not invalidate US Count Votes’ criticism.
Re-read my description above of the second set of statistical tests where the null is rK% => rB% + 6.5%. Only a finding of not-significant on such a test would validate your criticism of US Count Votes. The NEP report almost certainly did not run any statistical test like this. And I cannot for the life of me assume that their simple statement “there was no significant difference between the completion rates and the precinct partisanship” says that they performed anything similar to this test.
Arvin, I’m at a loss…
Alright, apparently the multiple posts has introduced some confusion.
Concerning the misquote. I have already pointed out that you did not misquote. The quotation you cited does exist in the 3/13 study. Mea Culpa.
My criticism remains that you continue to misrepresent the authors by constructing two straw men. I will repeat the relevant parts in order to make myself clear:
1) “you attribute to the authors an extreme statement about significance of the data which the authors do not make”
This has nothing whatsoever to do with the misquote above. I asked specifically for a citation from the 3/13 article where the USCV authors address the issue of *significance of data*. You have not provided this. However, your criticism of the authors from your own paper rests solely on this straw man:
“If these data were significantly correlated, the US Count Votes authors conclude the finding would suggest that ‘in precincts with higher numbers of Bush voters, response rates were slightly higher than in precincts with higher number of Kerry voters.'”
See how you attribute to the US Count Votes authors a statement that they feel the data needs to be significantly correlated? You entirely misunderstand their point. The data _do not_ need to be significantly correlated. That’s the first straw man you construct.
2) “They assert that the data and graph in question do not support NEP’s hypothesis, and you criticize them for failing to show that the data and graph in question support the opposite hypothesis.” What I mean is this:
You say: “US Count Votes is saying the graph is significantly positively correlated. It’s not. NEP said the data had no significant difference.”
What they are actually saying is: “The data do not support NEP’s hypothesis. According to their hypothesis, the graph would have a negative slope. In fact, you can see that it has a positive slope, the opposite of what you expect.” What they neglect to emphasize, but is inherent in their position is the following: “It doesn’t matter that the slope is positive. What is most important is that the slope does not support NEP’s hypothesis.”
One of the key distinctions between your straw man and US Count Votes’ actual position is the following. A non significant slope of regression (or negative ANOVA) supports US Count Votes’ position. You interpret it as evidence against US Count Votes’ position.
Arvin, good discussion over at KOS by the way ;-), now let’s wrap this one up…
You wrote: “1) “you attribute to the authors an extreme statement about significance of the data which the authors do not make””
To be clear, I’m only looking at the 3/13/05 study. Take three quotes together for full context.
First – Page 2, first full sentence:”In fact, data newly released in the report suggests that Bush supporters might have been overrepresented in the exit polls, widening the disparity to be explained.”
Second – Page 3-4, first sentence of paragraph that begins on page 3: “No data in the report supports the hypothesis that Kerry voters were more likely than Bush voters to cooperate with pollsters, and the data suggests that the opposite may have been true.”
Third – Page 4, first full paragraph: “This chart was construted from data within the report (p. 37) that is not analyzed or mentioned in the text. This data bears directly on the plausibility of the report’s central hypothesis, and it goes in the wrong direction. In other words, in precincts with higher numbers of Bush voters, response rates were slightly higher than in precincts with higher number of Kerry voters.
Okay – first of all, the data in the US Count Votes chart WAS analyzed and mentioned in the text of the NEP Report, contrary to this blatant misrepresentation by the group. In fact, the text directly above the table from which the US Count Votes pulled their data reads: “There was no significant relationship between the completion rates and precinct partisanship.” How much more obvious could that be?
Secondly, I assume that statisticians understand that when they say “goes in the wrong direction” and “slightly higher” and they put italics on the word “higher” they are suggesting something about the significance of the slope of the line. If E/M tested the line and it was not significant, then saying it “could be” positive is neither here nor there. I suspect that if I wrote that answer in a stats 101 mid-term, I’d fail.
You wrote: “What they are actually saying is: “The data do not support NEP’s hypothesis. According to their hypothesis, the graph would have a negative slope. In fact, you can see that it has a positive slope, the opposite of what you expect.” What they neglect to emphasize, but is inherent in their position is the following: “It doesn’t matter that the slope is positive. What is most important is that the slope does not support NEP’s hypothesis.””
I agree that the relationship does not help the E/M case; but, the NEP Report doesn’t rely on this relationship to make the rBr case. That said (and I know I am repeating myself); it is NOT a positive slope if the line is not significant. How can they/you say, “In fact, you can see that it has a positive slope, the opposite of what you expect.” How can you justify saying anything about the slope of a non-significant line? All they can say is: “The nonsignificant relationship here does not support the rBr hypothesis.”
The three quotes I cite above tell me that they are implying much more than is appropriate about a non-significant relationship. Certainly, one would expect a significant negative slope in support of the rBr hypothesis, but the rBr hypothesis is supported by much more data than a test of this single null. Just because you don’t reject a null hypothesis doesn’t mean that the null is false. It’s neutral, and doesn’t help, nor hurt their case.
The E/M case is supported by correlated WPE with several other variables (interviewer characteristics and some precinct characteristics). I suggest that the reason why these statisticians and math folks don’t seriously consider the other variables is that they aren’t pollsters or survey methodologists. They don’t have the practical experience to know what the correlated WPE, interviewer, and precinct characterisics mean. Unless the fraud was relatively uniform throughout the sampled precincts, it is really difficult to see how these same WPE’s also indicate fraud.
You wrote: “A non significant slope of regression (or negative ANOVA) supports US Count Votes’ position. You interpret it as evidence against US Count Votes’ position.”
Actually, I don’t interpret it as evidence against US Count Votes’ position and a non-significant slope does not support US Count Votes’ position because my point in context was that US Count Votes completely ignored all the other evidence provided for the rBr theory in their 3/13/05 study.
Remember, the quote from US Count Votes that kicked off this whole discussion? “However, the contention that “[no] data in the report supports the hypothesis that Kerry voters were more likely than Bush voters to cooperate with pollsters” is not in the least bit accurate. The NEP Report presented volumes of information that most analysts agree “suggests” support for the hypothesis that differential non-response was the cause of the observed bias in the exit polls [pp. 13-14, emphasis added].”
Now, I keep getting side tracked into these discussions and I still haven’t worked all the way through the 3/31/05 report. Not to say that I haven’t enjoyed our discussions here and at KOS ;-).
You wrote: “first of all, the data in the US Count Votes chart WAS analyzed and mentioned in the text of the NEP Report, contrary to this blatant misrepresentation by the group.”
Agreed. And the US Count Votes’ authors agree, too. The excised the sentence from the final version.
You wrote: “Secondly, I assume that statisticians understand that when they say “goes in the wrong direction” and “slightly higher” and they put italics on the word “higher” they are suggesting something about the significance of the slope of the line.”
Disagree. That is my first point. You are _assuming_. The authors *NEVER* mention significance. Every academic article that I have ever read that intends to make a statment about significance of data uses either “significantly different/higher/lower”, or refers to a “p-value,” or some other explicit reference to significance. The US Count Votes authors do none of these. Ask any statistician whether it’s a valid statement to say “value A is higher than value B, but the difference is not significant.” From my experience, with all the statisticians I’ve talked to, this is a valid statement. Saying that value A is higher than value B is a qualitative observation about the data. It is not quantified, and a significance test is not performed. Obviously, US Count Votes is in _no position_ to perform a significance test since they do not have the raw data.
Let’s put it this way, what would you tell me about two hypothetical polls, one performed prior to the Schiavo legislation, one performed after. The former shows Bush with a .48 approval rating, MOE +/- .04, the latter shows Bush with a .44 approval rating, MOE +/- .04. Are the two approval ratings significantly different? Can you say anything about them? Is the second approval rating lower than the first, even if it’s not significant? _What is the probability that the second approval rating is lower than the first?_ That last question is a critical question to ask. Likely, the probability is around .06. That is _not_ significant at a .05 level of confidence. _However_, it is still a very _low_ probability, and I would be remiss in not mentioning it if I were analyzing the two polls.
Furthermore, above and beyond these items, there is an issue within statistics wherein two different significance tests of the _same_ data and the _same_ hypothesis may have different results. Take, for instance, your distinction between ordinal groupings and a regression line. It is easily possible for an ordinal grouping of data to have a non-significant ANOVA while a regression line for the data shows a significant slope. I can provide a data set that shows this, at your request, but it really is trivial, and you should know this. If this is the case, then, it is definitely not the case that “it is NOT a positive slope if the line is not significant.” We don’t know what significance tests NEP performed. Examining the data, it easily could have been an ANOVA on the ordinal dataset. And a regression line may indeed prove significant.
You wrote: “Certainly, one would expect a significant negative slope in support of the rBr hypothesis, but the rBr hypothesis is supported by much more data than a test of this single null. Just because you don’t reject a null hypothesis doesn’t mean that the null is false. It’s neutral, and doesn’t help, nor hurt their case.”
Disagree. This is my second point. The fact that it is neutral does hurt their case. In fact, the fact that it is slightly positive (without being significantly positive) further hurts their case. Let’s consider this hypothetical: a regression analysis of the data were done, and the slope of the regression line turned out to be +.04, with a .05 confidence interval of [-.06, +.14]. Also, an across the board WPE of -6.5% for Kerry should have a slope of -.10 on this graph. The regression line does not have a slope significantly different than zero. Thus, it is not significant, as you say. However, if I were to ask you what is the probability that the rBr hypothesis — a consistent differential non-response across all segments of Kerry voters that resulted in -6.5% WPE — would produce data that looked like this? The answer would be, the probability is <.05. THAT, is a very different conclusion. I would bet you, it's a test that NEP did not do. If they did do it, they would have explicitly mentioned it which they didn't. Probability and statistics analysis is the furthest thing from black and white that you can get despite the fact that researchers like to lump things into two groups: significant and non-significant. The truth is always that there are probability distributions underlying everything. You said in a Daily Kos post, "If a bivariate relationship is not significant, it is not significant." As I show in paragraphs 5, 6, and 8, this is not the case. Significance depends on both the statistical methodology and they hypothesis being performed. The two hypotheses illustrated in the above paragraph are different: "slope = 0" and "slope = -.10," In this hypothetical, one is not significant and one is. You wrote: "The E/M case is supported by correlated WPE with several other variables (interviewer characteristics and some precinct characteristics). I suggest that the reason why these statisticians and math folks don't seriously consider the other variables is that they aren't pollsters or survey methodologists." That is a much more valid criticism. Rather than attacking their qualitative observations as being "inappropriate," you should have emphasized the above criticism in your article.
Minuteman Project Update
If you’ve read my previous posts on immigration (here, here, here, and here), you can probably guess what I think about the “Minuteman Project.” To be honest, when I first heard of the project, I envisioned a bunch of mullet…
Mark Blumenthal:
I wonder if you will try your hand at responding to the US Counts Votes report? You note that you have responded to Freeman’s paper, and that Rick has touched on the USCV paper, but you do not respond to or even comment on the USCV paper.
It would be helpful as you have a great way of getting beyond the technicalities and giving a big picture feel for how the report advances the discussion.
.
.
.
Arvin, Rick, thank you for your willingness to allow us to listen in on your debate. It is truly enlightening.
Arvin, are you familiar with the work Mark has done in this blog in predicting and attempting to justify the differential response bias hypothesis the NEP eventually put forward?
I ask because Mark listed many methodological factors that make me question the accuracy of exit polls (even as I believe the election results are ALSO inaccurate and untrustworthy!) Rick Brady alluded to those when he cites the WPE correlations to interviewers and precinct characteristics. Again, Mark predicted probable causes could be found in those factors, because his hypothesis was not just simply GOP voter nonresponse, but GOP voter nonresponse under specific, not general, circumstances that would make it more likely for them to not want to be interviewed.
So, Arvin, I would really like your take not just on the simple hypothesis, but the full, more complicated version Mark put forward trying to explain exit poll bias. And it would be really helpful if the USVC study looked at the more complicated model Mark’s explanations put forward. But I assume you have no influence over their work.
Thank you,
Just someone who would like to think the American people have the smarts and toughness to protect our elections from those who want to manipulate them (and us)
PS. I think both Rick Brady and US Counts Votes are actually on the same team. Citizens who are not afraid to critically question an exit poll issue which the powerful attempted to ignore. I would like Rick to maybe join the USCV effort and through his somewhat antagonistic perspective seriously improve the USCV study; probably by forcing them to probe the complicated methodological factors that the report seems to address inadequately.
Thanks Alex. I would add Arvin to that team and someone who goes by the call name “Febble,” who claims to be one of the US Count Votes reviewers/proofers. She has posted something very interesting which has direct consequence for the US Count Votes conclusions (as well as the E/M conlusions). I’m still fact checking her work, and it appears that Arvin is as well.
http://www.dailykos.com/story/2005/4/6/8028/83645
The problem is that she is not a PhD (yet) and therefore, US Count Votes won’t have to respond to her. E/M hasn’t responded directly and publicly to anything, but they might want to take a look at febble’s work. Not saying I endorse it yet, but if her proof pencils out, I say – wow!
She has sent me her SPSS files and promises to follow with more explanation of her variables and methods. From Arvin’s questions over at that post, it appears he is working it out now. (Let us know what you think when you are done Arvin)
It is always prudent to play the skeptic with science, but even more so when you’re dealing with statistics 😉
Arvin, I think we are getting somewhere.
First off, let me apologize for allowing my arrogance to innapropriately suggest that you did not have stats background. You clearly do, and much more so than myself. (BTW – I see that you are a grad student at UCI in CogSci – my good friend is working on a PhD, I believe in CogSci at Univ. of Utah. He did his undergrad in CogSci at UCSD)
I think that because you read the 3/31/05 study before the 3/13/05 study, your thoughts about my characterization of their 3/13/05 study may be colored by what you know of their analysis in the 3/31/05 study.
The first study does not mention WPE by precinct partisanship. They simply infer higher response rates from Bush respondents from data that the NEP Report seems to state was not significantly correlated.
Note that you wrote: “However, if I were to ask you what is the probability that the rBr hypothesis — a consistent differential non-response across all segments of Kerry voters that resulted in -6.5% WPE — would produce data that looked like this? The answer would be, the probability is <.05. THAT, is a very different conclusion." See how you bring in another variable that was not introduced until the 3/31/05 study? My criticism was ONLY directed at the US Count Votes contention that they could infer higher response rates from Bush voters from a line that the NEP Report said was not significant. The other layers (WPE by precinct partisanship and response rates) were not brought introduced until the 3/31/05 study. Now, regarding the significance/non-significance statements. Consider the audience. This paper was written for media execs that probably don't care about p-values and specific tests. All they want to know is what best explains the bias in the polls that they paid $10 mil or more for. So I assume there was some baseline regression analysis that showed that a large amount of WPE is explained by a host of interviewer and precinct characteristics that lead them to conclude that differential response (non-random within precinct sample selection) and differential non-response (reluctant Bush voters) was the best "fit." Then they wrote up a narrative with simple cross tabs and generalized the results. You wrote: "Probability and statistics analysis is the furthest thing from black and white that you can get despite the fact that researchers like to lump things into two groups: significant and non-significant. The truth is always that there are probability distributions underlying everything." True true. I understand that, as Ron Baiman did in his affidavidt, you can always reduce the confidence level and get "significant" results. And that multiple tests return disparate significance results. The entire point of my criticism, in context of ONLY the 3/13/05 study, is that US Count Votes made two assertions about the relationship between precinct partisanship and response rates: 1) it may be positively sloped; and 2) it is not analyzed or mentioned in the text. It was mentioned and analyzed in the text and the NEP said it was not significantly correlated, so both statements were wrong - I called them on it. I think that, unless they completely missed the statement of nonsignificance in the report, they should have said - "You say it is not significant, PROVE IT! Because it looks to us like it is positively correlated." But instead, they misrepresented, or missed that statement of the NEP Report, and therefore, left them open for criticism. Now, consider that the NEP Report is wrong and that the statement of nonsignificance was only based on "eyeing" the data. That may well be - so I said in my report to E/M - "Prove it" and I called for a regression of the interval variables, but certainly the ANOVA of the ordinal by stratum could work as well. In fact, both tests would be nice. One thing missing here is the WPE by geography. The most partisan Rep precincts were in the rural areas. The most partisan Dem precincts were in the urban areas. Yet the WPE was lowest in rural areas, and highest in urban areas. Interestingly, I bet that response rates were also up in the rural areas, and depressed in the urban areas. This goes completely against the US Count Votes "fraud in Bush stronghold" hypothesis. Perhaps, just perhaps, there were a lot of tradition Dems in urban areas that were not particularly proud of their vote for Bush, and avoided the pollsters. How can one ever PROVE this with exit poll data? Not sure it is possible.
I’m not clear about the following conclusion (top of page 11) in this paper: “When the data set is analyzed correctly, the possibility remains that the exit poll results for all three states are not significantly discrepant from the election tally.” The most salient issue here is the _joint_ probability of these discrepancies, which is not directly addressed in the analysis presented.
Ken, thanks for the comment and I’m glad to provide a response.
Dr. Freeman concluded that all three states were significantly discrepant from the official vote count; hence his graph of Ohio to show one example of the predicted proportion exceeding the confidence interval for his assumed standard error at the 95% confidence level.
The quote you reproduce in your comment was based on my analysis of Dr. Freeman’s data. I demonstrated that, assuming an accurate tally and no bias in the poll (Dr. Freeman’s assumptions), there are a number of “possible” scenarios where the discrepancies were not significant in each of the three states at the 95% confidence level. I don’t mean to say that these scenarios are “probable”; in fact, it is more probable that the discrepancies could be MUCH more significant than he states (as suggested by the UB of the p-values in my paper).
This has direct bearing on his 1:662,000 odds and statement that it was IMPOSSIBLE for all three “statistical anomalies” (as defined by being outside the confidence interval) to occur by chance.
It is not “impossible” if in fact, there is a chance, that each of the three anomalies are not significantly discrepant per his assumptions. Improbable, yes, impossible no.
Then, you might ask: What is the probability of 3 states with non-significant discrepancies in the same direction? That is .5*.5*.5 right (3 tails)? Those are 1 in 8 odds if I’m doing the math right, although I’m at lunch and I didn’t put much time into it.
Now, if Dr. Freeman had said – “let’s forget about confidence levels because that is largely a convention of the polling world (and elsewhere). Instead, let’s look at the probability that the magnitude of the three discrepancies (Z or t score) would occur in the same direction.” Now that is something entirely different and, if he had included the error bounds of his data, he could have noted the “range” of probabilities in a footnote or something.
Also, were there not 51 exit polls? Were not the discrepancies (according to Dr. Freeman’s data) larger in the Democratic strongholds (CT, VT, RI, NY, etc)? Why simply limit the number of trials to 3? Especially when the discrepancies were larger in Democratic strongholds, not battlegrounds.
My point is that Dr. Freeman should have at least been honest about the error bounds of his data. After all, myself, Mystery Pollster, and many others told him about these limits well in advance of his December 29 dated paper.
Since November 5th, we knew that the discrepancies could not be exlpained by chance alone. E/M stated so themselves.
Even if every single discrepancy were signed in the same way as the were (subtantial “red shift”), but were still within the margin of error, we would still be asking, “What went wrong with the exit polls.”
“What is the probability of 3 states with non-significant discrepancies in the same direction? That is .5*.5*.5 right (3 tails)?” I can’t tell you exactly what the correct approach to this analysis would be, but I can tell you that this is not even close.
Yup, that was off. Dumb mistake. Drats.
The odds for each state being on one side or the other of the normal distribution around the established standard (vote count) and within the assumed confidence interval should be 50/50 (coin flip).
What are the different permutations of three flips of a fair coin?
(TTT)(TTH)(THH)(HHT)(HTT)(HHH)
What is the probability of three tails or three heads? 1 in 6. Better? Or do I still need sleep?
ARGHHHH. Call me an idiot, I deserve it. I was right the first time. I missed the THT and the HTH. It is 1 in 8, or .5*.5*.5.
It would be the odds of observing discrepancies of a certain _magnitude_, in each of the three states, simultaneously.
Ken, I agree if that is what Dr. Freeman assumed. Note what I wrote above:
“Now, if Dr. Freeman had said – “let’s forget about confidence levels because that is largely a convention of the polling world (and elsewhere). Instead, let’s look at the probability that the magnitude of the three discrepancies (Z or t score) would occur in the same direction.” Now that is something entirely different and, if he had included the error bounds of his data, he could have noted the “range” of probabilities in a footnote or something.”
But, Dr. Freeman calculated the odds of three “statistical anomalies” occuring in three independent trials, which he defined as significant 1-tail p-values (.05).
If you have a threshold for determining significance, I think that if the threshold is not exceeded, you have to back away from the data. You don’t agree?
We’re dealing with a null hypothesis for each state: Kerry’s predicted proportion is not significantly discrepant from the vote count. I know I set that up as a two-tail null, but that is because his assumptions require this.
The threshold for rejecting the null and declaring a state a “statistical anomaly” was set at 95% confidence. He rejected his null (the wrong null) for each of the three states, and therefore looked at the magnitude of each significant p-value.
If you are going to look at the odds of a certain _magnitude_, in each of three states, simultaneously, then why set a threshold for significance?
I demonstrate that there is a possibility that all three states could be less than significant (.05), and therefore the nulls for each state would not be rejected. When you have three independent nulls, that are not rejected, I back away – either from the p-values (although they are close to significance), or from the significance threshold I’ve established. That’s how I’ve been taught, anyhow. Freeman stuck with his threshold by mentioning three stastical anomalies.
“We’re dealing with a null hypothesis for each state: Kerry’s predicted proportion is not significantly discrepant from the vote count.” If this is how Freeman has formulated his hypothesis, then he’s made a mistake. Moreover, his calculation of odds doesn’t reflect this approach at all. Apparently, what he’s done is multiply the p-values associated with each state’s discrepancy: .0073*.0164*.0126=.00000151=1/662251. Applying this approach to your more conservative estimates: .11*.10*.14=.00154=1/649.
“If you have a threshold for determining significance, I think that if the threshold is not exceeded, you have to back away from the data.” Imagine if there were a “non-significant” discrepancy _in the same direction_ in each of the 50 states…
If the raw data were available, I imagine that the appropriate analysis would be something analogous to a 2-way ANOVA, not a series of significance tests. It seems to me that Freeman’s analysis captures the essence of this approach.
Ken, I think we are now in agreement.
“If this is how Freeman has formulated his hypothesis, then he’s made a mistake. Moreover, his calculation of odds doesn’t reflect this approach at all. Apparently, what he’s done is multiply the p-values associated with each state’s discrepancy: .0073*.0164*.0126=.00000151=1/662251. Applying this approach to your more conservative estimates: .11*.10*.14=.00154=1/649.”
Agree in principle, although I haven’t checked the numbers. If you took the upper Bound estimates, the odds would be MUCH MUCH worse. My point was that it was foolish to use a single probability and make a big headline that splashed around the world: “IMPOSSIBLE!” There should have been honesty about the range inherent in the fuzzy data.
“Imagine if there were a “non-significant” discrepancy _in the same direction_ in each of the 50 states…”
Exactly! hence my point with the coin flips. Even if we had a slight discrepancy (p-value .90-.95) all in the same direction, we would be right to ask questions. Or, better yet, if we had a slight unidirectional discrepancy in only the battleground states, but relatively balanced discrepancies in the non-battleground states, we would be right to ask questions.
Frankly, E-M was asking questions the day after the election! The focus on significance and probability calculations, while interesting at first, should not be the focus. They are good headline grabbers though for those who see the odds and conclude: FRAUD!
“If the raw data were available, I imagine that the appropriate analysis would be something analogous to a 2-way ANOVA, not a series of significance tests. It seems to me that Freeman’s analysis captures the essence of this approach.”
I would settle for a very thorough multivariate regression analysis with every reasonable independent variable in the model. Other tests, including analysis would be nice, as I point out in my paper.
In fact, I must believe that multivariate regression was done and that a host of interviewer and precinct characteristics consistent with the E-M hypothesis explained a large portion of the WPE. Then they wrote a media exec-friendly document with crosstabs and means to explain the more complex findings. At least, that’s what I hope happened for E-M’s sake.
Take note of febble’s analysis linked to at MPs current post. She demonstrates that there was nothing unique about the discrepancies in the battleground states and concludes that for fraud to explain the magnitude and geographic distribution of the discrepancies, the fraud would have to massive and widespread. Unlikely.
None of this DISPROVES fraud. I’m sure there was some portion of the WPE that E-M could not explain, but the goal of their analysis was not to disprove fraud. It was to explain to media big shots what happened to their $10+ million bucks.
“Other tests, including analysis would be nice, as I point out in my paper.” would be nice, as I point out in my paper.”
That should read: “Other tests, including analysis
Again… must be an html problem…
That should read: “Other tests, including ANOVA would be nice, as I point out in my paper.”
I remain puzzled about the following conclusion (top of page 11) in this paper: “When the data set is analyzed correctly, the possibility remains that the exit poll results for all three states are not significantly discrepant from the election tally.” Are you saying that Freeman should have come to this conclusion based on the available data? I don’t see where you’ve shown that.
Yes Ken, I think that if Dr. Freeman understood or fairly represented the inherent limits of his data, the type of tests demanded by the assumptions he used about his data, and applied the more accurate and relevant (rather than liberal and dated) estimate for the standard error adjustment for a cluster sample, he would have concluded: “the possibility remains that the exit poll results for all three states are not significantly discrepant from the election tally.”
Then, he should have said: “I’ve demonstrated that under many scenarios of analyzing my data, there it is likely that the discrepancies in 3 key battle ground states are significant and are outside the realm of chance. Are these three battleground states any different than other states (he had the data for 49 states)? If so, how? What could explain the much much larger discrepancies in Democrat stronghold states like NY, CT, VT, and RI? So, we know that “something” happened and the polls or the count were biased. Now we need an explanation.” Well, that is exactly the question that E-M were working on.
My paper concludes that: 1) Dr. Freeman supressed evidence regarding the historical accuracy of exit polls; 2) reaches a conclusion that was not contested by Edison-Mitofsky, but did so improperly; and 3) to prop himself up as an “expert” on an issue that he is clearly not an “expert” so that when E-M released their explanation of what went wrong, he could be at the forefront of refuting their explanation (see his section of strawmen arguments leaving only the fraud hypothesis unchallenged).
If Dr. Freeman had been: 1) honest about the literature review and understood that all exit polls are not created equal; 2) been honest about the limits of his data and what impact this could have on his probability calculation; 3) been honest about the correct estimate of the design effect (or at least disclosed that Mitofsky and Merkle had denounced use of 1.3 for the 2004 election); 4) been honest about the error bounds of his probability calculations; and 5) included information about the much larger discrepancies in many Democratic strongholds so as to not give the impression that the battleground states (especially OH, PA, and FL) were unique; then I wouldn’t have written my paper.
I say “honest” because – every single point above was made to Dr. Freeman BEFORE he published his 12/29 version of the paper that I critique by either MP or myself. I spoke to Dr. Freeman on the phone for about an hour one evening and presented most of these concerns. Other concerns were sent via e-mail and, judging by his followups, were received by him.
Either he didn’t understand these points or he supressed them. I think Freeman is a smart guy and understood these points, but chose to reject them because they weren’t convienent to his argument. But, he could provide an explanation for why he rejected them, couldn’t he?
“When the data set is analyzed correctly, the possibility remains that the exit poll results for all three states are not significantly discrepant from the election tally.” The “lower bound” you are proposing is a probability of .00154, which doesn’t seem like much of a possibility to me.
Maybe I’m seeing your point now Ken. When I say “analyzed correctly,” your saying it should be analyzed the way Freeman analyzed it, but maybe he should have included the bounds of the data, which would still mean the results are well outside the realm of possibility. Right?
But the issue I’m raising is slightly different and perhaps a clarification is in order. What I mean, is that, when the data set is analyzed correctly *consistent with Dr. Freeman’s assumptions about significance* there is a possibility that the discrepancies for each state are not significant. That means, he had the right idea with the p*p*p, but he got there improperly (that is he set a two-tail 95% CL on each Kerry proportion for each state and rejected 3 nulls). My point is that there is a chance, when his data is analyzed consistent with the tests he employed (or implied with his graphic and narrative), there is a *possibility* that he rejected nulls for which his p-values may or may not have justified the rejection. That doesn’t mean that the discrepancies were probable – in fact I think I made that point crystal clear in my paper.
However, he had the data for 49 states and he knew that Democratic strongholds like NY, CT, VT, and RI were WAY more significant than OH, PA, and FL, and he knew that the battlegrounds weren’t actually all that unique (on the high side, but not unique). So, not only did he violate his assumptions of significance/non-significance per state, he withheld the information about the other states being even more significant, which I think would have really dented his case that something was up in the critical battlegrounds. That is, his data suggested massive widespread fraud to account for the red shift, but his paper emphasized the battlegrounds contrary to what the full story of the data. That’s a logical fallacy called “supression of evidence.” He improperly witheld information that was crucial to assessing the validity/propriety of his argument regarding OH, FL, and PA.
Freeman should have named all the states and all the p-values and calculated the probability of the overall red-shift, which would have CERTAINLY been well outside the realm of possibility. But then, that is basically what Edison-Mitofsky was saying on 11/5 when the said chance couldn’t explain the overall bias in the polls.
I should have added a couple more pages explaining this flaw of his paper as I have Freeman’s data set that he sent to me back in late November/early December of all 49 states.
So perhaps you are right, maybe a clarification is in order. I mull it over some more.
I like the analysis of Freeman’s work. Go ahead!