More on n-Sizes and National Exit Poll Regions

Exit Polls Legacy blog posts

Tonight I want to take up some additional questions readers have asked about the new exit poll data, specifically those about the regional composition of the national sample. For those who would rather not slog though a long post, I’ll cut to the chase: Though there is much confusion about the mechanics of a very complex set of surveys, I see nothing here to substantiate the wilder allegations of hoax and fraud. If you’re interested in the details, read on.

A review: Scoop, a New Zealand web site, recently put a set of PDF files online that were apparently created on Election Day by Edison/Mitofsky, the company the conducted the national exit poll, and distributed to their newspaper subscribers. The files are cross-tabular tables (“crosstabs” for short) for two questions (the vote for President and for U.S. House of Representatives), released at three different times (3:59 pm or 7:33 pm on Election Day and 1:35 pm the following day), tabulated separately for the national sample and four regions (East, South, Midwest and West) with each table run in two different formats (one with vertical percentages, one with horizontal percentages).

Most of the questions that have come up concern some odd inconsistencies regarding number of interviews conducted nationally and in the four national regions. The table that follows shows the unweighted sample sizes (“n sizes”) for the presidential and Congressional vote crosstabs:

One important note: All references to “unweighted interviews” in this post refer to the n-sizes reported in the PDF files. As we learned Monday, this number is inflated because 500 interivews conducted nationally by telephone among absentee and early voters were replicated four times in the unweighted data. This programming oddity did not affect the tabulated results for the presidential and congressional vote questions, because the telephone interviews were weighted back to their appropriate value (see Monday’s post for more explanation).

Here are the issues that puzzled MP’s readers:

1) Why are the unweighted counts on the cross-tabulations of the presidential vote question always larger than on the tabulations for vote for House of Representatives?

The table below shows the difference between the unweighted sample sizes for questions on the Presidential and U.S. House votes, in terms of raw numbers and percentages.

In every sampling, the unweighted number for the presidential vote was greater than for the congressional vote. For example, the n-size of the national presidential vote crosstab at 1:35 pm on 11/3 (n=13,660) is 1,011 “interviews” higher than on the congressional vote crosstab (n=10,223). As a percentage of the unweighted presidential question n-sizes, the differences are consistently between 6% and 7% (slightly larger in the West). Why?

The answer seems simple: Many who turn out on Election Day cast a vote for president but skip the lesser offices. In the 2000 election, according to America Votes, there were 105.4 million votes cast for President and 97.2 million for the U.S. House. The difference was 7.8% of the presidential vote – roughly the same percentage as the difference on the exit polls.

2) Why are there so many interviews in the Midwest and South? Is that plausible?

The first and most pointed commentary on this issue came from Steve Soto of TheLeftCoaster. Among other things, he argued:

[Mitofsky] had to pull 60% of his final exit poll sample from the South and the Midwest in order to make his final national exit poll reflect the tallied results. I ask you: how plausible is all of this?

Soto used the unweighted n-sizes above to arrive at the 60% figure. The weighting done by NEP on Election Day is primarily regional (to match turnout, as discussed on this site many times, especially here). We also now know that the unweighted tallies quadruple-counted the telephone interviews of early voters. Thus, the raw, unweighted data may be way off before the regional weights are applied.

However, even if we ignore all that, the size of the unweighted South and Midwest samples are not as far from reality as Soto may have assumed. I did my own tabulations on the unofficial returns widely available in mid-November using the regional definitions from the VNS exit polls four years ago: Those show 25% of all votes cast nationally coming from the Midwest region and 32% in the South for a total of 57%. So the even the unweighted interviews look a little high in the South and Midwest regions, but not that much.

Of course, the unweighted n-sizes are largely irrelevant to questions about the exit poll results.  The real question is what the weighted distributions looked like. They should have been spot-on, because the exit poll data are supposed to be weighted to represent the best available estimate at any given time of the ultimate regional distribution of voters. You need not take my word on this. Just open any of the crosstabs for the Presidential vote (3:59 or 7:33 pm on 11/2 or 1:35 pm on 11/3) or the Congressional vote (3:59, 7:33 or 1:35). Scroll down to the bottom and you will find tabulations for the four regions. The left column has the percentage of weighted interviews for each region. I have reproduced the table of all of these values below.

The point of this table is simple. The numbers show only a slight variation over the course of the day (that reflects the ever improving estimates of turnout) and ultimately match up almost perfectly with the ultimate regional distribution of votes cast. The unweighted n-sizes are irrelevant. Whatever the failings of the national exit poll, the national data were weighted appropriately throughout the day to match the true distribution of likely voters.

3) How could 811 new unweighted interviews appear in the East region between 7:33 p.m. on Election Night and 1:35 p.m. the next day? Why were comparatively few new interviews added after 7:30 EST in the West?

The table below shows the regional composition of the unweighted interviews for the three releases, as well as those that were added to the NEP data after 7:33 p.m. on Election Night: Note that 31% of the interviews added after 7:30 p.m were in the East and only 17% in the West. How can that be?

What follows is an educated guess, but I believe the answer is evident in both the poll closing times and the distribution of early and absentee voters interviewed by telephone.

Let’s start with poll closing times. As explained on this site previously, the NEP training materials instruct interviewers to suspend interviewing and call in their last batch of results approximately an hour before closing time. Some time is obviously necessary to key in these data and do additional data runs so that complete exit poll data is available to network decision makers in each state shortly before the polls close. Presumably, the 7:33 p.m. cross tabs represent complete exit poll data for States that closed at 7:30 p.m. EST or prior, but may have been missing many of the final reports for states that closed at 8:00 p.m.

The official NEP site provides poll-closing information for all 50 states. I had not noticed this before, but only two of the nine states whose polls close at 7:30 or earlier (Vermont and West Virginia) fall into NEP’s East region. Two are classified as Midwest (Ohio and Indiana), five as South (Georgia, Kentucky, North Carolina, South Carolina, Virginia). So it follows that the interviews added after 7:30 would be heavier in the East and Midwest than in the South.

The telephone interviews done among early and absentee voters explain why so few late interviews were done in the West. Two reasons: First, the telephone interviews were done before Election Day, so all of these interviews were included in the very first tabulations. Second, nearly two thirds of the telephone interviews (65% by my count) were done in Western States (Oregon, California, Washington, Colorado and Arizona). The East region included not a single telephone interview. Thus, two thirds of the votes cast in the West were already included in the data by 7:33, a time when the final reports of the day were likely still out for much of the East.

4) Why all the continuing confusion about all of this more than two months after the election?

Let me quote from the a comment by “NashuaEditor” “Luke” posted here last night (referencing information links from his the new blog NashuaAdvocate):

CBS has a published “final” draft of the Mitofsky Methods Statement on its website which says 11,903 non-telephone and 500 telephone interviews were conducted on Election Day, which gives us a sample size (I think we can now agree) of 13,903 (11,903 + [500*4 2]). This is actually 243 responses higher — not 59 — than the reported 13,660-voter sample size for the National Exit Poll [The 13,660 sample size is also reported elsewhere on the CBS site]

Why would CBS, an NEP member, have different data than its data-supplier, Mitofsky/Edison Media, more than two months after the general election? And why would that incorrect data be in the form of a *Mitofsky*-produced Methods Statement?…

I also can’t for the life of me figure out how the Washington Post was reporting 13,047 total responses on November 4th, 2004 — especially when this figure was given to the public by The Post pursuant to an “official correction”(!) Usually “official corrections” in The Washington Post can be trusted, right?

These are reasonable questions, and I could guess, but I would rather let Edison/Mitofsky, CBS and the Washington Post speak for themselves. Rather than speculate further, I will email representatives of each of the organizations for comment (including CNN, which also continues to show 13,660 unweighted interviews on its website), and I’ll post their reactions if any here.

For now, I’ll conclude with this thought: The issue quadruple counting of absentee/early telephone interviews in the unweighted data appears to have thoroughly flummoxed just about everyone involved, including yours truly. Most of what I see here is evidence of a lot of confusion about the mechanics of a very complex set of surveys and tabulations. It would certainly help if the networks would be more open about correcting and clarifying the apparent contradictions in their public documents. But evidence of a hoax? That’s quite a stretch.

Mark Blumenthal

Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.