Scientific hindsight shows that Google Flu Trends far overstated this year's flu season, raising questions about the accuracy of using a search engine, which Google and the media hyped as an efficient public health tool, to accurately monitor the flu.
Nature's Declan Butler reported today on the huge discrepancy between Google Flu Trend's estimated peak flu levels and data collected by the U.S. Centers for Disease Control and Prevention (CDC) earlier this winter. Google bases their numbers on flu-related searches (the basic idea being that more people Googling terms like "flu symptoms" equals more people catching viruses). The CDC, on the other hand, uses traditional epidemiological surveillance methods. Past results have shown Google to have a pretty good track record on mirroring CDC flu charts. But this time, Google's algorithms doubled the CDC's (accurate) figues — overshooting the mark in some regions by an even higher margin.
There's no doubt that this year's flu season was severe. Outbreaks hit early and hard by any measure. CDC officials declared an influenza epidemic in early January, Boston's mayor called a public health emergency around the same time, and Chicago hospitals struggled to keep up with emergency room visits. Still, Google's alarming snapshot of over 10 percent of the U.S. population experiencing flu-like illness was nowhere near the actual peak of 6 percent incidence. So what threw off Google's calculations this year? It looks like the media is (at least partially) to blame:
This year's flu outsmarted Google, probably because the algorithms were thrown off by media hype. nature.com/news/when-goog…— NYT Bits Blog (@nytimesbits) February 13, 2013
Lots of media attention to this year's flu season skewed Google's search engine traffic. People searching flu terms may have had symptoms — but many healthy Googlers were simply looking for news stories about this anomalous season. If the view count on our own story about Google's flu predictions is any indication, the flu was a hot search topic in early 2013.
This wasn't supposed to happen. Fawning media coverage of Google Flu Trends led readers to believe that their numbers were just as accurate as the CDC's, with the added bonus that Google's numbers came out a few days before official reports, getting closer to real-time flu tracking than government researchers. Google's own description of the tool also assures flu-trackers that Flu Trends' numbers "are very closely matched to traditional flu activity indicators." But many public health experts have not been convinced. The National Association of Healthcare Access Management warned us about Google Flu Trends' shortcomings earlier this year:
The CDC model can control more factors than the Google model. For example, if the CDC sees an increase in people going to their doctors for flu-like illness without an increase in lab samples testing positive for flu, it can look for signs of other respiratory viruses. The CDC will also call the state labs or individual physicians to try to get to the bottom of the discrepancy. Google, without these resources, may be tracking other illnesses with symptoms similar to the flu.
This stumble doesn't necessarily make Google Flu Trends irrelevant. Lyn Finelli, the head of the CDC's Influenza Surveillance and Outbreak Response Team, says even she checks their data "all the time." But it does mean that Google needs to recalibrate the way they mine big data to track the spread of disease, accounting for searches that may not be linked with infections. "You need to be constantly adapting these models, they don’t work in a vacuum," Harvard Medical School epidemiologist John Brownstein tells Butler. So while Silicon Valley techies are certainly doing some fascinating epidemiological work with projects like Google Flu Trends, when you need the most reliable flu stats, it's still best to stick with government-funded reports.