Google may be a master at data wrangling, but one of its products has been making bogus data-driven predictions. A study of Google’s much-hyped flu tracker has consistently overestimated flu cases in the US for years. It’s a failure that highlights the danger of relying on big data technologies.
Google Flu Trends, which launched in 2008, monitors web searches across the US to find terms associated with flu activity such as “cough” or “fever”. It uses those searches to predict up to nine weeks in advance the number of flu-related doctors’ visits that are likely to be made. The system has consistently overestimated flu-related visits over the past three years, and was especially inaccurate around the peak of flu season – when such data is most useful. In the 2012/2013 season, it predicted twice as many doctors’ visits as the US Centers for Disease Control and Prevention (CDC) eventually recorded. In 2011/2012 it overestimated by more than 50 per cent.
The study’s lead author, David Lazer, of Northeastern University, says the fixes for Google’s problems are relatively simple – much like recalibrating weighing scales. “It’s a bit of a puzzle, because it really wouldn’t have taken that much work to substantially improve the performance of Google Flu Trends,” he says. Merely projecting current CDC data three weeks into the future yields more accurate results than those compiled by Google Flu Trends. Combining the two resulted in the most accurate model of all. Lazer says Google Flu Trends does have promise, especially at predicting flu trends over smaller areas than the CDC takes into account, which could enable individual cities or states to prepare.