Why big data isn’t necessarily better data
Scientific American - 03/13/2014
Tech companies—Facebook, Google and IBM, to name a few—are quick to tout the world-changing powers of “big data” gleaned from mobile devices, Web searches, citizen science projects and sensor networks. Never before has so much data been available covering so many areas of interest, whether it’s online shopping trends or cancer research. Still, some scientists caution that particularly when it comes to data, bigger isn’t necessarily better.
Context is often lacking when info is pulled from disparate sources, leading to questionable conclusions. Case in point are the difficulties that Google Flu Trends (GFT) has experienced at times in accurately measuring influenza levels since Google launched the service in 2008. A team of researchers explains where this big-data tool is lacking—and where it has much greater potential—in a Policy Forum published Friday in the journal Science.
Google designed its flu data aggregator to provide real-time monitoring of influenza cases worldwide based on Google searches that matched terms for flu-related activity. Despite some success, GFT has overestimated peak flu cases in the U.S. over the past two years. GFT overestimated the prevalence of flu in the 2012-2013 season, as well as the actual levels of flu in 2011-2012, by more than 50 percent, according to the researchers, who hail from the University of Houston, Northeastern University and Harvard University. Additionally, from August 2011 to September 2013, GFT over-predicted the prevalence of flu in 100 out of 108 weeks.