Having raised the issue of the Cal Berkeley Report on alleged voting irregularities in Florida, I have been struggling with how much to comment on the ongoing debate among statisticians on their findings (some of it in the comments section of this blog). While this topic fascinates statisticians, it tends to leave the rest of us a bit puzzled. I think Keith Olberman spoke for many:
I have made four passes at "The Effect of Electronic Voting Machines on Change in Support for Bush in the 2004 Florida Elections," and the thing has still got me pinned to the floor.
Most of the paper is so academically dense that it seems to have been written not just in another language, but in some form of code. There is one table captioned "OLS Regression with Robust Standard Errors." Another is titled "OLS regressions with frequency weights for county size." Only the summary produced by Professor Michael Hout and the Berkeley Quantitative Methods Research Time is intelligible.
I have been following the debate, and thought about doing a reader’s guide to some of the statistical issues. I am holding off, at least for now, as I am not sure most readers of this are as obsessed by this subject as those who have left comments. I could be wrong — let me know if you’d really like to learn more.
For now, let me share a few web sites that have done a good job summarizing the key issues. The best, and easiest to understand, was the post by Kevin Drum on Saturday. It is short and worth reading in full, but here’s the gist:
It turns out that [Berkeley Prof. Michael] Hout’s entire result is due to only two outliers: Broward County and Palm Beach County. This suggests several things:
- There was almost certainly not any systemic fraud. If there were, it would have showed up in more than just two counties.
- The results in Broward and Palm Beach are unusual, but it’s hard to draw any conclusion from just two anomolies. As Kieran says, "it seems more likely that these results show the Republican Party Machine was really, really well-organized in Palm Beach and Broward, and they were able to mobilize their vote better than the Democrats."
- Anyone who wants to continue investigating possible fraud in Florida anyway should focus on Broward and Palm Beach.
Drum based his summary largely on more detailed critiques by Kieren Healy at Crooked Timber (see also the comments) and Columbia University Political Science and Statistics Professor Andrew Gelman. In the comments section of this humble blog, you will find a post from George Mason University Political Science Professor Michael McDonald eviscerates the Berkeley Study as "completely worthless." Other critiques come from bloggers Newmark’s Door, Rich Hasen’s Election Law Blog, and Alex Strashny.
The bottom line is that the Berkeley Study’s conclusions are something less than a slam-dunk. As Kaus’ "Feiler Faster" theory might predict, peer-review-by-Internet has moved at lightning speed. Yes, Michael Hout and his colleagues have impressive academic credentials, but then so do Michael McDonald and Andrew Gelman [and B.D. McCullough and Florenz Plassman — see update below]. The results in Broward and Palm Beach counties are unusual, but the fact that these two counties are among the biggest and most Democratic in Florida with the greatest populations of Jewish voters (see NewmarksDoor) cripples the county level analysis. Continue investigating? Certainly, but the conclusion in the Berkeley report that "electronic voting raised President Bush’s advantage from the tiney edge he held in 2000 to a clearer margin of victory in 2004" looks premature at best.
Again, if readers would like to me to attempt to "demystify" the underlying statistical issues, leave or comment or email me. Otherwise, I’ll return to issues of polling methodology.
UPDATE: Prof. McDonald’s complete write-up of his critique of the Berkeley study is now posted on his website.
BONUS UPDATE (For Kausfiles Readers): A paper by two economics professors — B.D. McCullough of Drexel University and Florenz Plassman of SUNY Binghamton — that rebuts the Berkeley/Hout Study point by point. Money quote: "We conclude that the study is entirely without merit and its ‘results’ are meaningless."
I’d love for you to do the “”demystifying” post.
While you’re at it, could you explain how everyone has managed to forget that Gore’s VP was an orthodox Jew? Wow, what a surprise, more Jews voted for the Democrats when they had a Jew on the ticket!
A correspondent to Dan Weintraub, the “California Insider”, offers an alternative explanation for the UCBerkeley results:
http://www.sacbee.com/static/weblogs/insider/archives/001631.html:
“But mathematically, it’s equally valid to suppose that there was a Republican suppression factor in the 2000 and 1996 elections — that is, that the Democrats cheated in counting punchcards in heavily Democratic districts in past elections — which they were unable to do in 2004 with the electronic voting machines.”
Mark Blumenthal, I believe you misrepresent both what the CalBerkeley consider their major findings and what Gelman has to say about the findings. I have not idea why you think you can demystify the results when you don’t seem to be able to understand the paper at all.
First a little background on statistics. The statistics used in this analysis were originally developed for a specific purpose. They were developed by Francis Galton, and his student Karl Pearson, to be used as a tool in the eugenics movement. In other words they were developed to measure differences in intelligence in populations in ways that could be easily explained to the general public. For this reason while the process of statistics is rigorous and at times difficult to understand, the outcomes used for analysis are generally transparent. That is true in this case. You seem to be confusing the process with the outcomes which leads you to suggest they are more difficult to understand than they actually are.
You then I think misstate what Hout and his students consider to be the outcome. I listened to the actual new conference (can you get a transcript). What they were very clear about in their findings was that they found that Palm Beach County, Broward County and Miami-Dade all deviated by a statistically significant margin from all other counties. They stated the chance of this deviation occurring by chance was one in one thousand (p<.001). They suggested in their general conclusions that one of the variables was electronic voting, but they again were very clear that they understood all of the variation due to electronic voting was due to these three counties. Gelman suggests it is only two counties that accounts for all the variation, but that does not dismiss the idea that Miami-Dade deviated significantly. The main point of Hout and his group was that the deviation should be investigated (they did not mention electronic voting per se at their conference except as a variable.) Now here is what Gelman says to that very topic. Cheating? Something unusual seems to have happened in Broward and Palm Beach counties in 2004. One possibility, as suggested by Hout et al., is cheating, possibly set up ahead of time (e.g., by loading extra votes into the machines before the election or by setting it up to switch or not count some votes). This explanation makes a certain amount of sense, in that, if someone wanted to cheat ahead of time, it would make sense to do it in Florida, and it would make sense to do it in the large-population counties where a 5%-or-so swing in votes could make a difference in the statewide total. A glance at the first graph above makes it clear that the swings-toward-Bush-that-don't-fit-the-general-pattern-in-the-state are a feature just of Broward and Palm Beach counties--not of all the e-voting counties. If you remove the two big red circles from the left of the plot, there doesn't seem to be much going on. Just plain voting? Two counties explain all the difference. I don't know what was going on in these counties, what else was on the ballot, etc., but an obvious alternative explanation is that, for various reasons, 3% more people in those counties preferred Bush in 2004, compared to 2000. As can be seen in the graphs above for 2000, 1996, and 1992, such a swing would be unusual (at least compared to recent history), but that doesn't mean it couldn't happen! As you will notice Gelman agrees that this is an unusual variation, and seems to agree with all the other statisticians that this should be investigated (he is willing to entertain the possibility that it could happen – but it Hout et. al. are correct again the chances are really less than one in one thousand. All right, so you have misrepresented the Berkeley study and you have misrepresented Gelman’s response to the study (as did Drum). Now let me explain exactly what the study was suggesting. Let’s take the issue back to intelligence testing (which I despise, but always serves as the best exemplar). Then let us say you are trying to determine a regression to the mean of intelligence in various counties. Two different intelligence tests are being used – Stanford-Binet and Wechsler. You determine among the population that there is a mean and a standard deviation from that mean (100-120). You find that those who take the Wechsler (120) fall further away from the mean than those who use Stanford-Binet (100). Perhaps it is the Wechsler test that is causing the difference, but this really doesn’t make a lot of sense. Anyway, like any good scientist you take a closer look at interaction effects. What you find is that almost all of the variance can be found in three counties that use the Wechsler – Palm Beach, Broward, and Miami-Dade. Now this gets really interesting, because while the Wechsler deviated in general, these three counties deviate from the null hypothesis by more than two standard deviations – what are the chances of these three counties being so much further away from all the other counties? - that would only happen by chance one in a thousand time (160). And all three counties did this. The first thing you think – which both Hout and Gelman refer to – is that there is some difficulty in the administration of the test. THIS WOULD ALWAYS BE THE FIRST THING YOU LOOK AT. If you determine that there was no difficulty, or cheating then you look for other explanations. There are some explanations that are just more odious, such as Jews are naturally smarter and these three counties have more Jews so that is the reason. There are some that are less so such as for some reason a great many very smart people congregated in this area. But I want to again reinforce that the first thing any social scientist would look at the administration of the test. To understand Hout’s point all you have to do is interchange votes with intelligence. So now do you understand?
The Last Word on the Bogus Berkeley Study
Mystery Pollster has a post declaring that bogus Berkeley study, if not dead, is on life support. He concludes: The bottom line is that the Berkeley Study’s conclusions are something less than a slam-dunk. As Kaus’ “Feiler Faster” theory might…
The reason Bush did so much better in Palm Beach and Broward Counties is that he received a 14-16% swing in Jewish retiree precincts, at least in Palm Beach.
The analysis is in an update to this post: http://www.patrickruffini.com/archives/2004/11/the_last_word_o.php
Here is an interview on San Francisco radio station KPFA with Michael Hout, author of the Berkeley study, Mike McDonald, of George Mason University (who trashed the study as “completely worthless”), and Rachel Best, co-author of the report:
http://www.kpfa.org/cgi-bin/gen-mpegurl.m3u?server=209.81.10.18&port=80&file=dummy.m3u&mount=/data/20041122-Mon0700.mp3
the interview starts at the 00:32:44 mark.
http://newmarksdoor.typepad.com/mainblog/2004/11/still_more_on_t.html
Still more on the Berkeley study of Florida voting. I want to point to some of the other interesting work on this and to thank people who’ve written nice things about my posts. A useful place to start would be
I have now had a chance to read Dr. McDonald’s criticism. I think it is important to discuss this because it seems to be the chief criticism of the Hout study (the only real criticism if you actually read Gelman, which many of the bloggers seem to have failed to do). McDonald engages in what can best be called a “kitchen sink” criticism. That is he throws everything but the kitchen sink at Hout hoping that something will stick. None of his arguments are really cogent and he would have been much better off attempting to develop a stronger single argument. What he is trying to do is create the aura of expertise through quantity rather than quality of the argument. This is a cheap academic trick that individuals usually engage in when they are desperately trying to protect turf. I have been dealing with people like McDonald most of my adult life and in general I don’t find them very attractive.
Let me explain what I mean by kitchen sink. He critiques the Hout study on all four types of validity (face, external, internal, and construct). Let me take each one at a time.
Face Validity
McDonald make the argument that a 4.1 vs a 2.4 differential is not enough face validity to continue a study based on differences. Therefore the study is suspect. It is important to recognize that face validity is a very low bar – it is also very subjective. In any case McDonald offers absolutely no discussion of why this does not meet the standards of face validity. He simply makes the assertion.
External Validity
McDonald makes the assertion that the Hout study does not have a compelling theory to help justify their categorizations. Please see my above comment for a short history on these types of statistics. It is also important to understand where Galton was coming from. His general work was to make general hypotheses (not theories) about differences (of all manner) and then to see if these differences did exist in a significant fashion. So the simple hypothesis that there were differences between touch screen and optical counties is easily enough of a starting point to engage in this type of statistical analysis. Again, if McDonald disagrees he has to make the argument why.
Internal Validity
McDonald is concerned with the size of the population which means he is concerned with power. I don’t know enough to know if this is a viable critique. But if McDonald is going to make such a critique it is incumbent on him to talk directly about power and offer statistical evidence. I do know that Stewart and Gelman, two of the best statistical minds on the country, seemed to find no flaws with internal validity and my guess is this went through rigorous internal review. In any case it is incumbent on McDonald to use actual numbers in this critique to explain his reasoning.
Construct Validity
This is the most important validity, but the bar is very high in using it as a critique. Basically what you are saying is that this type of thing can’t happen in the real world so something must be wrong with the analysis. The argument must be very compelling. Instead all that McDonald offers is a tautology – “The analysis shouldn’t be done, because voter fraud is impossible, so the analysis is suspect.” This would be laughed out of any undergraduate course on logic.
Reliability
This is the argument that McDonald makes when he talks about Broward and Palm Beach being closest to two optical scan counties. But McDonald offers no reliability coefficient to buttress his assertion. Again, this would be one of the first things to come up in any internal review.
I am surprised that Hout didn’t just walk right out on McDonald. In essence all McDonald is doing is giving the Hout study the Bronx cheer.
I forgot to label construct validity in the paragraph between internal validity and reliability. This is important. Sorry.
I have taken a look at Ruffini’s analysis and again a tautology (come on you guys can do better than this). The question is why did Palm Beach County, which has a large Jewish population that historically votes Democratic (and we have years and years of data on this) suddenly switch to Republican in a way that was not reliant on simple chance. Ruffini’s explanation: because a majority of Jewish voters went Republican. This is actually a little silly.
Staying away from the statistics, this article (http://americanthinker.com/articles.php?article_id=4038) examines the implications of election results based on the predictions of the Berkeley result. The results if 260,000 votes went to Kerry would mean that he would have gained almost 92% of the incremental vote pool in those 3 counties. That does not seem to make sense at all.
The KPFA interview with Dr. Hout, Rachel Best, and Michael McDonald includes highlights I think are relevant to this blog:
–Dr. Hout addresses McDonald’s concerns on interaction terms by pointing out the standard error in the study was relatively small.
–Absentee Ballots, ie paper ballots, are probably diluting the e-vote effect, and Dr. Hout now calls on counties to provide Absentee and Election Day votes.
–Rachel Best did allow that Jewish voters may or may not be a “cause,” but Jewish population data is not available by county. 1996 data provides some “non-Lieberman Jewish vote” control.
–Michael McDonald worked for the exit poll service this election! He discounts exit poll accuracy back to 1988. I imagine contractual obligations probably prevent him commenting on the exit poll posts on this blog.
So, “Wilbur”, it is your contention that having Leiberman on the ballot in 2000 didn’t cause more Jews to vote for Gore – Leiberman than would have voted for Gore – Edwards?
And that having President Bush spend 4 years being a very good friend to Israel wouldn’t cause a bunch of otherwise Democrat voting Jews to vote for Bush?
Is that because you think Jews are automatons who just always vote Republican, no matter what?
Sorry, that last sentence should be “Is that because you think Jews are automatons who just always vote DEMOCRAT, no matter what?”
Statistics looking for a deviation need a standard to deviate from.
In Florida we don’t have much of a standard.
In FL possibly: 2000 involved fraud (if so – how much and which counties?); the population in Florida counties has changed since 2000 and we don’t know how; were 2004 voters who didn’t vote in 2000 more conservative than those who did; maybe Liberman did draw Jewish voters in 2000, yada, yada, yada.
I would like to see statistical analysis of electronic in 2000 vs. punched in 2000. And for electric from optical. And for no change in vote method. Chart against Bush % in the 30 densest counties in the country for both elections. Six charts with 30 points each.
Alas, maybe such massive data doesn’t exist.
There is one premise I accept about our elections – populous counties tend to vote more Democratic. So did they swing more to Bush on electronic machines, on optical, or cards?
My guess is there were no large frauds. Republican fraud is hard to put past Democrats where the local officials are Democrats – i.e. in the densest counties. Also, people do not keep their collective mouths shut in the US. Election workers are people.
sorry, make my 2:42 read
” I would like……..electronic in 2004 vs. punched in 2000.
Regarding one of the main critiques of the UCAL study made by McCullough and Plassman:
“An easy way to show that there is something seriously wrong with a statistical study is to use the same data and the same approach to reach the opposite conclusion. HMCB apparently never bothered to check this aspect of their model, else they’d have easily found it: we show that HMCB’s modelling approach also supports the contention that electronic voting favored Kerry.”
You saw it here (http://www.mysterypollster.com/main/2004/11/the_ucal_berkle.html) first:
“I forgot. If you repeat UCAL’s exercise on Kerry’s data, you get the same positive influence of electronic voting (the more electronic voting, the bigger Kerry’s gain). Remarkable.
Posted by: Wonka | November 19, 2004 10:36 AM”.