Why Google Flu Trends Will Not Replace CDC Anytime Soon

Google Flu Trends is a useful tool in tracking the severity of a given flu season. But take some caution in drawing conclusions from it. Exhibit A: Look at the discrepancy between the following two charts.

The first is Google Flu Trends’ plot of national flu activity, compared with the last six years of flu activity. Google compiles this chart by sifting through the search terms that someone might type if they had the flu. In the aggregate, this information is used to create a chart of flu outbreaks that often matches the Centers for Disease Control and Prevention's data on outbreaks.

The bright blue line is the current year—and it looks pretty bad.

Now, look at a chart from CDC’s Outpatient Influenza-like Illness Surveillance Newtork. This data is compiled from 3,000 clinics across the country that report to CDC when they see patients with flu-like symptoms. This is one of several graphs included in CDC's weekly flu reports. They also track mortality rates, have a network of labs analyzing viral specimens, survey hospitals for confirmed flu patients, and track the geographic spread of viral strains, among others.

The red line is the current flu season.

If you have a keen eye, you can see the difference. The first chart implies that this is the worst flu season in recent years, while the second shows a severe season but still below the H1N1 (swine flu) pandemic year of 2009-10. Granted, CDC's data runs about a week behind Google's, but even data points from a few weeks back show this discrepancy.

In a world where Target knows a woman is pregnant before she or her family does, the power of enormous digital data sets should not be ignored. We all leave huge tracks of data wherever we go—shopping in stores, often, but especially on the Internet. And these swathes of data create a shadow of our real world that can be analyzed. But it’s still just a shadow. While Google’s Flu Trends has passed the acceptance of academic research, the search giant admits that the project needs constant tweaking.

“We intend to update our model each year with the latest sentinel provider [influenza-like illness] data, obtaining a better fit and adjusting as online health-seeking behavior evolves over time,” Matt Mohebbi, a Google software engineer wrote on Forbes recently. “With respect to the current flu season, it’s still too early to tell how the model is performing.”

Recently, I spoke with Lynnette Brammer, a flu epidemiologist with the CDC, about the role of technology in disease tracking and about how we can explain the discrepancy in the charts above. Below is an edited version of our interview.


Is it fair to compare CDC's flu tracking with Google's Flu Trends?

There’s pros and cons to both systems. Because [Google’s] system is real-time, that gives us a peak of what might be coming a week early. The downside is, you can’t track down those signals. If you see an increase in flu, you can’t find out what it really is, at least not directly. Whereas in our ILI system, if we see an increase in people going to the doctors for flu-like illness and we look at our laboratory data and we don’t see any increase in flu, we’ll check with our colleagues who look at other respiratory viruses. And we don’t see anything there, we can always call the state, and the state calls the physician, and say, “The next few people who come in that look like they have flu, can you take some samples and send them to our lab?” We can directly follow up on those and investigate things that we think are unusual.

There can be things in Google Flu Trends that cause signals that maybe aren’t flu. They try very hard to filter out signals that could be caused purely by increased media interest. At the beginning of the pandemic in 2009, they probably had done too good of a job of that. They didn’t pick up the signal in the beginning. They modified, and they had said all along, “We’re going to do this, and if we learn more we’re going to modify our algorithms." And they did, and it seems to be working, but they seem to be a little high. Changes in people’s [search] behaviors can change that.

It’s really hard, certainly for us at CDC, to understand what’s causing that change. They’re seeing pretty much record levels of influenza-like illness. And while ours are high, they’re not at historical limits by any means. We just have a lot more flexibility and ability to track down and ask additional questions and find the answers to those questions.

So if what we’re seeing out of Google is skewing high, how would you describe this season’s outbreak?

This season is within the range of what we see when this influenza A H3N2 virus, which is predominant right now. When that virus circulates, we tend to have more severe seasons. Not all H3N2 seasons are severe, but the severe seasons tends to be H3N2 seasons. H3N2 virus tends to affect the elderly, so it’s not a huge surprise to us that we are seeing more illness, more hospitalizations, and more deaths in the elderly this year.

It’s similar to what we saw in 2007, 2008, a little bit less probably than what we saw in 2003 and 2004 season. It’s certainly in the range of normal for this type of virus.

Five years from now, what do you predict will change in the way CDC tracks the flu? Or what do you hope for?

In five years from now, I would like to get an even larger proportion of my laboratory data electronically, using standard messaging. We have been working to get more and more of that doctor-office data electronically.

And as more and more of them get those systems, we want to be able to collect the data in a way that doesn’t take up a physician or somebody in his office staff, it doesn’t take up so much of their time. We want to make reporting easier and, we want to get more data, but we don’t want to put an additional burden on people.

I’m hoping that our mortality data is based more and more on electronic death certificates. A fair number of [our reporting sites] sit there and count how big of a stack, how many death certificates came into their office that week. And they’ll sit there and actually read those death certificates and look and see how many had pneumonia, or if there is influenza listed on here. And if death certificates become electronic, and they can get those transferred in electronically, those can get automatically coded and this data can be extracted without having someone sit there and read all these death certificates. Hopefully soon it will all be seamless.

So it sounds like the optimal flu tracking system is a combination of Google Flu Trends and CDC’s tracking. Instant data, but also very specific data.

We want the data transmission to be as easy for the people providing it to us as possible. But the thing we don’t want is to lose the connection we have with those people. Even if you have really good data coming in, you’re always going to have questions about what it means. And nothing replaces being able to call up a doctor’s office, when you are seeing something unusual, and just being able to say, “What do you think is going on?" And these guys in addition, when they see something that’s unusual or something they don’t understand, they have the phone number of somebody at the health department, so I think they are more likely to call. We don’t want to lose that.

I’m imagining that there are Purell dispensers everywhere at CDC. Is that a fair guess?

No. I mean there are, there are Purell dispensers by the elevator and I’ve got a bottle of Purell up in one of my drawers somewhere. We’re all conscious of trying to stay healthy, and good hand hygiene and things like that, but we’re not uber-paranoid or anything. We want people to stay healthy because we want them to work through flu season.

When did you realize this winter was going to be a severe flu season?

It’s not one of those things when you have an "a-ha" moment and say, “Oh my gosh, this is going to be bad.” It sort of builds gradually. The fact that we never got down to less than 1 percent [of our samples] testing positive for flu during the summer sort of gave me a not-so-good feeling. It usually gets down below 1 percent and I think the lowest we got was maybe 3 percent over the whole summer. And we were starting to see these H3N2 viruses and we were starting to hear from our colleagues in Australia and New Zealand that they had a fairly severe flu season, and those viruses are what circulated there.

What do you make of companies like Target who mine sales records for hints of people’s health states—like if they are pregnant or not? Are there any of these “big data” implications for national public health? For flu tracking?

There’s a lot of things that are surrogates for something else. If you tracked Kleenex sales, you’ll probably track the flu seasons. We joke that anything can really track the flu season if you really want to make it. While some of that stuff is interesting and there are implications financially for some things, we can only do so much. So we’re sticking to the stuff that’s closer to the actual source, which for us are the viruses.