Google flu trends: the limits of big data
The New York Times - 03/28/2014
Google Flu Trends, once a poster child for the power of big-data analysis, seems to be under attack.
This month, in a Science magazine article, four quantitatively adept social scientists reported that Google’s flu-tracking service not only wildly overestimated the number of flu cases in the United States in the 2012-13 flu season — a well-known miss — but has also consistently overshot in the last few years. Google Flu Trends’ estimate for the 2011-12 flu season was more than 50 percent higher than the cases reported by the Centers for Disease Control and Prevention. And, they wrote, for a period of more than two years ending in September 2013, the Google estimates were high in 100 out of 108 weeks.
The article, “The Parable of Google Flu: Traps in Big Data Analysis,” declared that Google was guilty of “big data hubris,” which the authors defined as the implicit assumption that big data sets trump traditional data collection and analysis. And they were skeptical of Google Flu Trends’ algorithmic smarts. “The comparative value of the algorithm as a stand-alone flu monitor is questionable,” they wrote.
A follow-up analysis by the four authors tracked Google Flu Trends’ performance in the just-concluded 2013-14 flu season, after Google updated its algorithm last October. There was some improvement, but the service still overshot by about 30 percent, the authors wrote, in their paper, posted online.