Nate Silver and the Ascendance of Expertise
I usually pepper my posts with links, but today I feel lazy, so I listed a bunch of links at the bottom - hours of fascinating reading you can have after you read my post!
Who is Nate Silver?
Nate Silver likes to play with numbers. He started out with sports, then burritos, then politics. He, using statistics, correctly predicted most (not all, but almost all) presidential and congressional races in 2008, 2010 and 2012. Back in 2010, he came to ScienceOnline and moderated a session (together with Arikia Millikan) on using math to study human behavior online - the Web Science.
What does Nate Silver do?
Twenty years ago, there were only a few pollsters out there and they did relatively few polls. Today, there are many polling organizations and they, especially in the home stretch of an election, poll incessantly, every day. They do national polls, state-wide polls, even local polls. Over the years, they refine their methodology. Some predict outcomes better than others, for a variety of reasons.
Nate Silver averages all the polls, weights each poll according to the statistics of past performances, and produces a daily-changing set of numbers predicting outcomes of various electoral races. For the Presidental elections, unlike pundits focusing on national polls, he rightly focuses on state polls, especially in swing states, in order to predict the winner of the Electoral College - the only thing that really counts (we can discuss if that is right or wrong, but that is how the game is played now, so that it what he measures).
What did Nate Silver not do?
As a couple of bloggers (see links at the bottom) pointed out, Nate Silver did not do Big Data. These are pretty small and limited data-sets he has at his disposal. In aggregate, they are powerfully predictive, but that is not Big Data, though the motivations and methodologies are similar.
As Silver started in sports statistics, being a part of the Moneyball movement in baseball, people assume that what he is doing now is the same thing. But it is not. It is also not the same as what he did with burritos, though that comes closer.
In baseball (and later in basketball, though horse racing and betting industry has been doing this for a century at least), there are hard data. Player hit the ball or did not. Caught the ball or did not. The ball ended in a spot X or did not. It was a home-run or it wasn't. Empirical data. Are two players good buddies or not does not matter that much at that level - they are both professionals and will do their best regardless of interpersonal relationships, body language and other subjective parameters. Thus, stats in sports work well, as they are based on clearly measurable things. From such stats, one can rank players and teams, and predict with quite a high degree of accuraccy which teams will win and which will lose. Or which horses have which odds for winning a race.
So again, What did Nate Silver do?
People focus on numbers, imagining they are hard data. But remember that the numbers come from polls. Polls are questionnaires. What Nate Silver did was social science.
Polls ask questions. People answer them differently. They may have conscious or unconscious biases. They will have different backgrounds and different levels of being informed. Some will lie on purpose, to skew the polls, as part of their activism. Some will lie unconsciously because they are afraid to tell what they really think. People respond differently if they are polled over their land-line phones (public) and differently if called on their cell-phones (private), and differently in online polls versus being asked in person, face to face (e.g., in exit polls). Some people put a lot of thought into their answers. Others want to do it as fast as possible and go with gut feeling, or even almost-random responses.
Different pollsters will ask similar questions, but with slightly different wording. And we know that wording affects the responses. The order of questions also affects responses.
Each pollster can only reach a limited number of people, so the small sample size results in a pretty large error.