On Outliers and Party ID

Legacy blog posts Sampling Error Weighting by Party

Last week, the Gallup organization released a survey sponsored by CNN and USAToday, fielded February 4-6, that appeared to show a surge in President Bush’s job approval rating from 51% to 57% since mid-January.  “The Iraqi elections…produced a bump in President Bush’s approval rating,” said CNN.  “Americans gave President Bush his highest job approval rating in more than a year,” read USAToday

Gallup immediately went into the field with a second poll conducted February 7-10 that showed the Bush job rating back down at 49%, “slightly below the levels measured in three January polls, and well below the 57% measured in Gallup’s Feb. 4-6 poll.”  Unlike the first survey, this one was not co-sponsored with CNN and USAToday, and thus as blogger Steve Soto put it, this poll did not get “bull-horned through the media” the same way as the first.

As such, I want to consider the question Soto raised Monday on TheLeftCoaster: “How often is there a 16% swing in a public opinion poll in one week?”

The short answer is, not very. 

But then I never seem satisfied with short answers, do I?  Let’s take this one step at a time. First, a minor quibble:  “shifts” in polling numbers always seem more dramatic when you compare the margins, or in this case the difference between the approval and disapproval ratings because doing so artificially doubles the rate of change.  The February 4-6 survey showed 57% approving Bush’s performance, 40% disapproving (for a net +17).   The second survey showed 49% approval and 48% disapproval (net +1).  Thus, 17-1 = a 16 point shift.  The problem – if this shift were real – is that it would have only involved about 8% of the population changing their opinion.   That number is still quite large, and would certainly be outside the reported sampling errors for samples of 1,000 interviews, but does not sound quite as astounding as a “sixteen point shift.”  Better to focus on the change in percentage expressing approval than the margin of approval minus disapproval. 

Second, I might rephrase Steve’s question a bit:  “How often do we see real shifts in presidential job approval of this magnitude?” 

Rarely.  That answer is evident in the remarkable graphic maintained by Professor Stuart Eugene Thiel (a.k.a. Professor Pollkatz) and copied below.  The Pollkatz chart shows the approval percentage on every public poll released during George W. Bush’s presidency.  It is obvious that the approval percentage may vary randomly at any point in time within a range of roughly 10 percentage points, but trends are evident over the long term that tend to be slow and gradual.  The exceptions are a few very significant events:  9/11, the invasion of Iraq and the capture of Saddam Hussein. 

During 2004, the average Bush job rating did not vary much.  It dropped a few points in April and May during the 9/11 Commission hearings and the disclosure of prisoner abuse at the Abu Ghraib prison.  It rose a few points following the Republican convention and has held remarkably steady ever since.

The graph also shows that “big swings” do appear by chance alone for the occasional individual survey.  These are “outliers.”  On the chart, a few polls fall outside the main band of points, and the February 4-6 Gallup survey is an obvious example. It shows up as the diamond-shaped pink point at the far right of the Pollkatz graphic (click on the image or here to see the fullsize version at pollkatz.com).

How often does such an outlier occur?  Remember that the “margin of error” reported by most polls assumes a “95% confidence interval.”  That means that if we drew repeated samples, we can assume that 19 of 20 would produce results for any given question that fall within a certain margin of error.  However, we should expect at least 1 in 20 to fall outside of sampling error by chance alone. 

With the benefit of hindsight, it seems obvious that the February 4-6 Gallup was just such an outlier.  Other surveys done just before and  after (see the always user friendly compilation on RealClearPolitics.com) show no comparable surge and decline in Bush’s job rating in early February. 

The bigger question then is what CNN, USAToday and Gallup should have done – without the benefit of hindsight – when they released the February 4-6 survey.  Steve Soto immediately requested the party identification numbers from Gallup and found that the survey also showed an unusual Republican advantage.  His commentary and that of Ruy Teixeira again raise the issue of whether surveys like Gallup should weight by party ID. 

I wrote about this issue extensively in September and October.  Here’s a quick review:

Most public survey organizations, including Gallup, do not weight by party identification (Gallup has restated their philosophy on this issue in their blog, see the February 10 entry). Unlike pure demographic items like age and gender, Party ID is an attitude which can change especially from year to year (although academics continue to debate just how much and under what conditions, see the recent reports by the National Annenberg Election Survey and the Pew Research Center for discussion of long term trends in party identification).

The problem is that partisan composition of any sample can also vary randomly — outliers do happen.  Unfortunately, when they do we get news stories about “trends” that are really nothing more than statistical noise.  To counter this problem, some pollsters, such as John Zogby, routinely weight their surveys by some arbitrary level of party identification.  The problem with this approach is deciding on the target and when, if ever, to change it.  Zogby often uses results from exit polls to determine his weight targets.  Raise your hand if you consider that approach sound given what we have learned recently about exit polls.

The conflict leads to some third-way approaches that some have dubbed “dynamic weighting.”  I discussed these back in October. The simplest and least arbitrary method is for survey organizations to weight their polls by the average result for party identification on recent surveys conducted by that organization — perhaps over the previous three to six months.  The evolving party identification target from the larger combined sample would smooth out random variation while allowing for gradual long-term change (see also Prof. Alan Reifman’s web page for more commentary on this issue). 

I am not an absolutist about this, but I am less comfortable with dynamic weighting in the context of periodic national news media surveys than for pre-election tracking surveys.  There are times when dynamic party weighting would make a poll less accurate.  Consider the Pew party ID data which showed a sharp increase in Republican identification after 9/11.  Dynamic weighting surveys done at the time would have artificially and unfairly decreased Republican identifiers. 

With hindsight, it is easy to see those patterns in the data, just as it is easy to see that the February 4-6 Gallup numbers were likely a statistical aberration.  But without the benefit of hindsight, how does a news media pollster really know for certain? Media pollsters are right to strive for objective standards rather than ad hoc decisions on weighting by party. 

When Steve Soto looked at the unusual Republican tilt in Gallup’s party ID numbers on their February 4-6 survey, he concluded that “Gallup appears to be firmly a propaganda arm of the White House and RNC.”  I don’t think that’s fair.   Gallup did not “look at the electorate” and “somehow feel” that party ID should be a certain level, as Soto describes it.  Actually, Gallup did just the opposite.  They measured public opinion using their standard methodology and refused to arbitrarily tamper with the result.  We may not agree with that philosophy, but Gallup believes in letting the chips (or the interviews) fall where they may. 

I do agree with Soto on one important point:  The party ID numbers ought to be a standard part of the public release of any survey, along with cross-tabulations of key results by party identification.  Gallup should be commended for releasing data on request even to critics like Soto, but it really should not require a special request. 

Also, when a survey shows a sharp shift in party identification, news coverage of that survey should at least note the change — something sorely lacking in the stories on CNN and in USAToday about the February 4-6 survey.  Consider this example from the Wall Street Journal‘s John Harwood on the recent NBC News/WSJ poll:

Public approval of Mr. Bush’s job performance held steady at 50%, while 45% disapprove. Because the new WSJ/NBC poll surveyed slightly more Democrats than the January poll, that overall figure actually masks a slight strengthening in Mr. Bush’s position. While the views of Democrats and Republicans remained essentially unchanged, independents were more positive toward Mr. Bush than a month ago.

Of course, this sort of analysis raises its own questions:  What was the sample size of independents? Was the change Harwood observed among independents statistically significant?  The story does not provide enough detail to say for sure. But at least it acknowledges the possibility that change (or lack of change) in the overall job rating may result from random differences in the composition of the sample.  That is an improvement worth noting.

Outliers  happen.   Truly objective coverage of polls needs to do a better job acknowledging that possibility.

minor typos corrected

Mark Blumenthal

Mark Blumenthal is political pollster with deep and varied experience across survey research, campaigns, and media. The original "Mystery Pollster" and co-creator of Pollster.com, he explains complex concepts to a multitude of audiences and how data informs politics and decision-making. A researcher and consultant who crafts effective questions and identifies innovative solutions to deliver results. An award winning political journalist who brings insights and crafts compelling narratives from chaotic data.