Errors, margins and samples: What does the latest poll say?

The most recent ABC News/Washington Post poll claims a margin of error of plus or minus 4 percentage points; this is a pretty standard margin-range for a serious national poll. In that poll they have Mitt Romney leading Barack Obama by 2 percentage points. What should that mean to you?

If you interpret the idea of margin of error at face value, it means in this case that there is 95 percent likelihood that the final vote tally would provide Obama with anywhere from 42 to 50 percentage points and Romney with anywhere from 44 to 52 percentage points. Unfortunately, though, that's a gross oversimplification.

Quoted margin of errors are large and significant. Here is the final difference in the national popular vote for the Democratic and Republican candidates in the last few presidential elections: 7 percentage points in 2008; 2.5 percentage points in 2004; 0.5 percentage points in 2000; 8.5 percentage points in 1996; and 5.5 percentage points in 1992. Some of those elections were close, others not as close--yet the largest difference was 8.5 percentage points. That puts a margin of error of 4 percentage points into perspective; it is meaningful.

What's more, most people only really care about polling when a race is relatively close. Generally a race is not really close unless it is separated by just a few percentage points--frequently within a 4 percentage point margin of error.

For the ABC survey we first mentioned above, the margin of error is only relevant for questions that include the poll's entire sample of 1,002 adults. The quoted result that gives Romney a 2 percentage point lead of Obama is of a subset of the full sample that distinguished themselves as registered voters. Thus, there is a higher margin of error on this main result of the poll.

The margin of error is the sampling error, which is just a portion of possible sources of error. The sampling error assumes that the pollster has taken a perfectly random selection of a perfectly selected representative sample of respondents.

But we don't really live in a randomly arranged universe--of political choices or anything else. There are numerous reasons why a selection becomes non-random. Once a polling company has chosen to contact a person at random, the poll may not reach the intended respondent--or that respondent may refuse to answer the poll at all. Varying polling techniques intended to select who in the household should answer the questions, time of day of contact, etc., all can, well, randomly backfire. And that, in turn, can dilute the poll's sample with unrepresentative selections.

A sample can fail to be representative, meanwhile, in a host of other ways. To quote the old shibboleth, the only poll that matters is on Election Day--the entire political process is in fact focused on the random sample of voters who actually turn out to vote. If we are lucky, at our present vantage of 13 months out from Election Day, we can isolate some self-identified registered voters who can be somewhat representative of the eventual November 2012 turnout. What's more, by the standards of other Western democracies, the United States has a notoriously complicated set of requirements and obstacles to casting a ballot on Election Day--and recent changes in many state election laws have become even more restrictive. Figuring out who will actually vote on Election Day is not obvious or easy.

Finally, there is an odd quirk in most of the the standard voter-intention questions that poll respondents typically answer. Pollsters will ask their respondents how they would vote if "the 2012 presidential election were being held today." Of course, the election is being held 13 months or so from this particular poll--and the voters, the candidates, and the overall dynamic of the election will shift dramatically between now and then.

So it's not at all surprising to see enormous daily fluctuations within polls--even when those surveys come from the most reliable sources. The chart below shows John McCain's two-party vote share, relative to Obama, for the 2008 presidential election.

Yet, despite this margin of error and the daily fluctuations, polls are a very meaningful source of information in understanding and predicting an election. You just need to be clear on what a poll means and how best to use it.

You can, for example, profitably aggregate a new poll with other, earlier polls; poll aggregation site such as Pollster and RealClearPolitics note the latest polls, but focus more on their aggregated presentation of the fuller range of polls. RealClearPolitics has Romney up by 0.5 points relative to Obama in aggregated head-to-head polls--and compared to the sampling and margin-of-error issues that can crop up in any individual poll, this approach provides a relatively stable metric.

You can also aggregate a new poll with other, earlier polls, and additional information about the election. In prediction market such as Betfair and Intrade users do just that as they trade contracts on who will win upcoming elections. The prediction markets do not have head-to-head match-ups prior the nominations, but real-time data has Obama at 48.7 percent likelihood of winning the election and Mitt Romney has a 29.2 percent likelihood of winning the election, with a 58.9 percent likelihood of winning the Republican nomination.

Editor's Note: I changed the wording in the second paragraph to reflect the vote shares of the candidates rather than the differences between the two candidates to avoid confusion over details that cannot be encapsulated in this post. The implied differences between the two candidates depend on a further set of assumptions that are not declared by the polling companies. More details will follow in a later post. Thanks to Stefan Herzog for a spirited conversation on this topic.

David Rothschild is an economist at Yahoo! Research. He has a PhD in applied economics from the Wharton School of Business at the University of Pennsylvania. His dissertation is in creating aggregated forecasts from individual-level information. You can follow him on Twitter @DavMicRot and email him at PredictionBlogger@Yahoo.com.

Other popular Yahoo! News stories:
Tuesday's Republican debate will save Perry's campaign, or end it
Muslim woman sues Southwest for kicking her off San Diego-bound flight
Hank Williams Jr. fires back at Fox, ESPN