While I was focused on the AAPOR conference, I missed the release two weeks ago by the Pew Research Center of their massive new "Typology" study (full pdf version here). Their study, which classified voters into nine "homogeneous groups based on values, political beliefs, and party affiliation," shows that political public opinion is far richer than the differences between Democrats and Republicans. As implied by the report’s title — "Beyond Red. Vs. Blue" — the divisions within the political parties may be more interesting than the divisions between them. For those that want a thorough understanding of American public opinion, the Pew typology study is certainly a "must read." Its reception in the blogosphere is also instructive to those of us that conduct polls and survey research.
One of the most intriguing things about the Pew report is the web page ("Where Do You Fit?") that allows anyone to answer a series of questions and find out which of the nine categories they fit into. This page has generated much interest in the blogosphere. If you are not familiar with the study, it’s a great place to start. The first principal findings page also has a summary of the nine groups.
Several readers have asked for my take. Here goes:
The study’s methodology page includes a two paragraph description of how they created the typology. Let me try to unpack their description a bit (with the help of a Pew researcher who helped clarify a few issues):
The value dimensions used to create the typology are each based on the combined responses to two or more survey questions. The questions used to create each scale were those shown statistically to be most strongly related to the underlying dimension.
They started with the 25 questions listed on the Where Do You Fit page and performed a statistical factor analysis which identified a smaller number of "underlying dimensions" among the questions. They found, for example, a common pattern in the way respondents answered questions about military strength vs. diplomacy, the use of force to defeat terrorism and the willingness to fight for your country. Respondents that supported military force on one question also tended to support it on the other two. So informed by the factor analysis, they created a single "scale" that combined the three military force questions into a single variable. They repeated this process for eight specific value dimensions that they list early in the report (under the heading "Making the Typology," or on page 9 of the full PDF version).
The description of how they constructed the typology continues:
Each of the individual survey questions use a "balanced alternative" format that presents respondents with two statements and asks them to choose the one that most closely reflects their own views. To measure intensity, each question is followed by a probe to determine whether or not respondents feel strongly about the choice they selected.
In past typologies, the Pew researchers asked respondents a series of "agree-disagree" questions (see this page for examples from 1999). The problem with that format is something survey methodologists call "acquiescence bias" – a tendency of some respondents to agree with all questions. So this year, to get around that problem, they used a format that asked respondent to choose between "balanced alternative" statements. To measure the intensity of feeling, they also asked a follow-up question in each case. "Do you feel STRONGLY about that, or not?" [Clarification: Pew switched to the balanced alternatives format on their 1994 typology study, although the cited 1999 survey has examples of the older agree-disagree format].
Consider this example. On past surveys, they asked respondents whether they agreed or disagreed with this statement: "The best way to ensure peace is through military strength." This time, they asked respondents to choose between two statements, the one used previously and an alternative: "Good diplomacy is the best way to ensure peace."
The use of these forced choices bothered some in the blogosphere. New Donkey’s Ed Kilgore wrote:
Question after question, the survey lays out a long series of false choices that you are required to make: military force versus diplomacy; environmental protection versus economic growth; gay people and immigrants and corporations and regulations G-O-O-D or B-A-A-D. Other than agreeing with a proposition mildly rather than strongly, there’s no way to register dismay over the boneheaded nature of these choices.
[Jeff Alworth has a similar critique at Blueoregon].
He protests a bit too much. The questions used to create the typology are intended as "broadly oriented values" measures (p. 10) not positions on specific policy proposals. Moreover, the language of the questionnaire anticipates that respondents may not find it easy to agree with one statement, so it asks respondents to choose the one that "comes closer to your own views even if neither is exactly right." As Kilgore recognizes, the measure of intensity (feel strongly or not) provides one way to gauge whether respondents had trouble choosing. However, on the actual survey questionnaire (as opposed to the online "Where Do You Fit" version) included another out: Respondents could volunteer a "neither" or "don’t know" response; these ranged from 6% to 14%.
Continuing with the explanation of how Pew constructed the typology:
As in past typologies, a measure of political attentiveness and voting participation was used to extract the "Bystander" group, people who are largely unengaged and uninvolved in politics.
Simple enough: They defined "Bystanders" as those who pay little attention or rarely vote, and set them aside before turning to the heart of the procedure:
A statistical cluster analysis was used to sort the remaining respondents into relatively homogeneous groups based on the nine value scales, party identification, and self reported ideology. Several different cluster solutions were evaluated for their effectiveness in producing cohesive groups that are distinct from one another, large enough in size to be analytically practical, and substantively meaningful. The final solution selected to produce the new political typology was judged to be strongest on a statistical basis and to be most persuasive from a substantive point of view.
So what is "cluster analysis?" It is a statistical technique that attempts to sort respondents into groups where the individuals within each group are as similar as possible but the differences between the groups are as large as possible. Typically, the researcher must decide what questions to use as inputs and how many groups to create and the cluster analysis software sorts out respondents accordingly.
One minor technical issue is that a true cluster analysis can only work on a complete sample. It functions by a process of repeatedly sorting, comparing groups and re-sorting until it reaches a statistically optimal result. This is an important point for those trying to use the online Typology Test — it will not produce precisely the same result as the true cluster analysis because the online version classifies respondents one at a time. As I understand it, the Pew researchers designed the online version to provide a very close approximation of how individuals get assigned, but it may not yield exactly the same classification as on the full survey.
A much bigger issue is that cluster analysis leaves a lot of room for subjective judgment by the researcher. The statistical procedure produces no magically right or wrong result. Instead, the typology "works" if, in the view of the researcher, it yields interesting, useful or persuasive results. The Pew authors acknowledge this issue when they say they aimed for a balance of "analytical practicality" and "substantive meaning" and looked at "several different cluster solutions" before settling on the one they liked best.
The key point: The Pew Typology is not meant as a way to keep score, to tell us who is ahead or behind in the political wars. We have measures of vote preference, job approval and party identification that perform that task admirably. Rather, the Typology is more useful in understanding the undercurrents of opinion at work beneath the current party ID or vote preference numbers. Armed with this knowledge, we can then speculate endlessly about what the political score might be in the future. If the Pew Typology provides useful data for that debate, it succeeds.
What I find most intriguing about the Pew Typology is the way it has been received and dissected in the blogosphere. At the AAPOR conference, I wrote about plenary speaker, Prof. Robert Groves, who noted that traditional measures of survey quality put great emphasis on minimizing various sources of error but do not take into account the perspective of the user. Perhaps, he suggested, we need to think more about the perceived credibility and relevance of the data.
Turning to the blogosphere, I note two things. First, as of today, when I enter "Pew typology" into Google, seven of the top ten sites (including the top two) are blogs. The other three are for the Pew site itself. Thus, a side point: If blogs have gained influence, Google is a big reason why.
Second, as I sift through the commentary by bloggers on the left and right, most of the debate is through the Pew typology rather than about it. Yes, some are critical, but the debate mostly centers on the appropriate meaning and interpretation of the results, not whether they are worth considering. Others may disagree (that’s the blogosphere after all), but from my perspective the Pew typology has achieved considerable relevance and credibility on all sides of blogosphere.
That says a lot.
[Typos corrected]
Thanks for the excellent discussion, MP. In particular, I agree with your take on the forced choice items–some bloggers seem to have missed that Pew’s primary goal was to sort among various belief systems.
Part, at least, of the blogosphere’s acceptance is based on Pew’s reputation. Not all pollsters are treated the same, fairly or not (and it ain’t alweays fair).
Conceptually at least, factor analysis and cluster analysis are mirror images of each other. As many of you know, a data file is basically a spreadsheet, with respondents comprising the rows and variables, the columns.
A factor analysis is based on correlations between the columns (variables). For any two variables, you can move down the columns, examining whether the two variables are receiving similar responses from each respondent.
A cluster analysis is based on correlations between the rows (respondents). For any two respondents, you can move horizontally across the spreadsheet, seeing if the respondents are giving similar answers on each variable.
In the Pew project, factor analysis — with its emphasis on affinity among VARIABLES — was used as a “data reduction” device to collapse a large number of individual items into coherent subscales of items.
Cluster analysis — with its emphasis on affinity amond RESPONDENTS — was used in the final step to create clusters of people.
The issue of subjectivity comes into play with factor analysis, just as with cluster analysis. Because factor analysis and cluster analysis involve similar procedures and merely flip-flop the role of columns and rows, it seems inevitable that both would implicate similar issues.
Determining how many factors to retain is subjective, just as determining how many clusters to retain (computer programs can provide multiple solutions). Also, readers would be interested in the interpretability of either a factor or cluster solution.
Some guidelines exist in determining the number of factors to retain, involving a mathematical entity known as an “eigenvalue” (e.g., retaining as many factors as have eigenvalue greater than 1, or cutting off the number of factors in accordance with a precipitous drop in the magnitude of the eigenvalues).
All of what I’ve said above about factor analysis holds only for “exploratory” factor analysis, when the researcher simply feeds the variables into the computer and lets the computer generate solutions. With another type of factor analysis, the “confirmatory” variety, the researcher specifies in advance which subsets of variables should “hang together” as separate factors. The computer then returns a “fit statistic” of whether the a priori assignment of items to factors is tenable.