One of the most persistent challenges in science today is how to get the mainstream press—and by extension, the general public—to pay attention to the most important scientific research of the day.
A massive new study by Northeastern researchers uses more than 91,000 scientific papers published in 2016 to demonstrate that machine learning can be used to predict press coverage for future research.
Among their findings was that media coverage is often determined more by the subject matter of the research than by its scientific importance. For example, research that involves personal health or climate change consistently gets more coverage than studies involving cell biology or applied mathematics, according to Ansel MacLaughlin, a doctoral student in computer science who is the first author on the study.
“In some ways, this study is a warning shot across the bow,” said journalism professor John Wihbey, who co-authored the paper along with computer science professor David Smith. “Should scientists be working harder to make their work comprehensible? Should journalists take a broader view of what research topics are worth reporting? If a computer can predict what we are going to do, maybe we need to rethink our habits of how we determine what is a science story and what isn’t?”
The study found that the biggest single factor in determining press coverage is publicizing the research with a press release. The prestige of the scientific journal in which the research is published is also important, although the subject of those journals also plays a big role, given that six of the top 10 journals in terms of press coverage are dedicated to health research.
“There is significant demand among the general public for scientific news, but the original journal articles are written for an academic audience and not for the average person,” said Renata Nyul, vice president for communications at Northeastern University.
“University communications offices are in the unique position of turning that highly technical information from the scientific journals into stories—which we used to call ‘press releases’—to make it easy for reporters to see news value in them,” Nyul said.
From MacLaughlin’s perspective, the most significant element of the study was demonstrating that machine learning can be used to train algorithms to predict how much press coverage a scientific paper will receive.
He said the next phase of his project will focus on refining his algorithm so it can analyze sentence structure, word choice, and phrasing techniques within a press release or abstract to optimize the chances of a study being reported in the media.
“This paper is the first cut at the problem,” said Wihbey. “It takes a novel approach to answering an age-old question.”