Predicting discoveries: Enabling research or killing novelty?

Northeastern’s Roberta Sinatra turns to the “science of science”—an interdisciplinary field that uses the deluge of data available today—to understand what drives scientific discoveries. Photo by iStock

Meteorologists strive to predict the weather. Network scientists develop complex algorithms to predict the spread of disease. Might it also be possible to predict the emergence of scientific discoveries? If the answer is “yes,” what are the benefits—and pitfalls—of the ability to do so?

Those are questions that Roberta Sinatra and her colleagues grapple with in a recent essay in the journal Science. For the answers, the researchers turn to the “science of science”: an interdisciplinary field that uses the deluge of data available today—everything from the number of citations research papers accrue to individuals’ career trajectories—to understand the social components driving scientific discoveries.

“Everyone who makes decisions regarding science would like to be able to predict discoveries, from researchers and funding agencies to journal editors and faculty hiring committees,” says Sinatra, vis­iting research assis­tant pro­fessor at North­eastern and assis­tant pro­fessor at Cen­tral Euro­pean Uni­ver­sity in Budapest. “But there are downsides and upsides to that. In a time of limited resources, such predictions would enable us to use tax dollars for research more effectively. However, if we base predictions primarily on past success, we are biasing the system and perhaps killing novelty.”

 

Predictable or random?

How predictable are scientific discoveries in the first place? It depends on the nature of the research and what aspects of a discovery are being considered, note Sinatra and her coauthors, who work at the University of Colorado, Boulder, and the Santa Fe Institute.

Roberta Sinatra, Sinatra, vis­iting research assis­tant pro­fessor at North­eastern and assis­tant pro­fessor at Cen­tral Euro­pean Uni­ver­sity in Budapest. Photo by Santiago Gil

Projects involving large numbers of people doing experiments will accumulate evidence over time, providing clues that a discovery is imminent. Finding the Higgs boson and determining the human genome sequence are two examples of such “expected” discoveries. “It’s like looking for a single missing piece to complete a puzzle,” says Sinatra. Then there are the discoveries that “come out of the blue,” she says, citing the discovery of penicillin, which was so unexpected its significance was not recognized for 15 years. “Sir Alexander Fleming stumbled on a puzzle piece that had no context,” she says, referring to the biologist who discovered the antibiotic. “The rest of the puzzle had to grow around it.”

Earlier research led by Northeastern’s Albert-László Barabási, Robert Gray Dodge Professor and University Distinguished Professor of Physics, found that the timing of creative breakthroughs is also not predictable but random, toppling conventional wisdom, which typically holds that major contributions diminish with age. “Scientists can achieve success at any point in their careers as long as they keep producing,” says Sinatra, who was first author on that study.

Everyone who makes decisions regarding science would like to be able to predict discoveries, from researchers and funding agencies to journal editors and faculty hiring committees.

Roberta Sinatra
vis­iting research assis­tant pro­fessor at North­eastern and assis­tant pro­fessor at Cen­tral Euro­pean Uni­ver­sity

Still, mining today’s vast troves of data with sophisticated algorithms and other tools reveals patterns that provide insight into aspects of discovery that can indicate future success. “Researchers whose papers have a lot of citations will continue to produce papers that get lots of citations,” says Sinatra. “The combination of visibility, luck, and positive response form a feedback loop. It’s the same mechanism driving the rich-get-richer dynamic.”

So which explains more about the drivers generating scientific discovery—randomness, as shown with penicillin and the timing of breakthroughs, or predictability, as shown with the citations?

“We have to keep the element of risk, because that is how research progresses,” says Sinatra. “On the other hand, we know that the patterns can help guide our understanding of the process of discovery. We must continue exploring both contributions.”

 

A wake-up call

The patterns revealed through data mining also sound a wake-up call, says Sinatra. They show a systematic bias toward women and some minorities, meaning that those researchers’ papers are cited less frequently. And a select group of prestigious institutions dominate where discoverers are trained, leading to a constriction of the areas researched as well as the makeup of the entire scientific workforce.

“These biases lead to resources such as grants and faculty appointments being concentrated on particular groups, including white males living in North America,” says Sinatra. “This is a big issue. We and other researchers studying the science of science are continuing to collect data and quantify the magnitude of this bias.”

She and her coauthors conclude that perhaps the best way to ensure the continued generation of new discoveries may be to focus not on predicting individual breakthroughs but on encouraging “a healthy ecosystem of scientists.”

“As physicist Freeman Dyson notes in his essay Birds and Frogs, a discovery is not made by just one individual or via one project but by a community, an ecosystem,” says Sinatra. “So what we have to nurture is not just individuals but the entire ecosystem. That means emphasizing the work of the visionaries—the ‘birds’ of the ecosystem, who see the big picture—as well as that of the ‘frogs,’ who see the details and do the technical work fundamental to the whole. Indeed, if we destroy that balance, we might destroy the entire process of discovery.”