Skip to content

Northeastern network scientists are developing AI tools to predict — and prevent — the next epidemic

Director of AI and life sciences Samuel Scarpino is co-author on a new paper that describes how artificial intelligence can help model future infectious disease epidemics.

Digital lines of various colors swirling across a black background.
Artificial intelligence is the latest tool employed by network scientists in the race to model infectious disease epidemics — before they occur. Photo by Alyssa Stone/Northeastern University.

The next epidemic is coming — it’s simply a question of when.

Network science researchers at Northeastern University are developing artificial intelligence tools that could predict what will happen the next time an epidemic breaks out — tools that may even help prevent it in the first place.

“Our goal is to improve the human condition by improving our understanding of how sociotechnical systems function,” says Samuel Scarpino, the director of AI and life sciences at the Institute for Experiential AI

Scarpino is a co-author of a new paper published in Nature that describes how artificial intelligence can be leveraged to model future infectious disease epidemics. 

Network science, Scarpino explains, provides “a common language” researchers can use to study systems that operate at multiple scales, from “cellular-level processes for things like cancer, all the way up to population-level processes like pandemics and epidemics.”

“If it sounds broad, it is,” Scarpino adds.

Portrait of Samuel Scarpino.
To model future epidemics, “we need new kinds of AI and new kinds of approaches,” says Samuel Scarpino, director of AI and life sciences at the Institute for Experiential AI. Photo by Matthew Modoono/Northeastern University.

Scarpino notes that the contextual challenges required to model events as complex as epidemics are huge. “AI models really struggle when there’s a huge amount of context required to make a prediction,” Scarpino says.

ChatGPT’s success is largely thanks to breakthroughs in how the AI handles context, the “transformer models that changed the way context was being brought in,” he says.

The contextual problems faced by language-oriented AI models “pale in comparison to the complexity of the context associated with living systems.”

Layers of information

“These systems,” he says, by which he means networks both organic and societal, from human immune systems to population mobility, “are adaptive both over very short timescales and also over evolutionary timescales, and as a result of that adaptation, if I perturb the system, I may not be able to make accurate predictions about what is going to happen — even if I have a really good understanding of how the system is behaving.”

“That means that we need new kinds of AI and new kinds of approaches.”

One of the biggest challenges facing Scarpino and his colleagues is the issue of interoperability. How do network scientists get radically different kinds of data to talk to each other?

The “siloed approach to studying living systems is doomed to fail,” Scarpino says. 

“If we really want to have good models of how a pandemic is going to unfold, [models] have to include information globally,” and include “representative data sets, interoperable, feeding into models from all over the globe.”

Scarpino gestures toward the recent outbreaks of H5N1 — “bird flu,” colloquially — as an example of the many different kinds of data that need to work together for AI to make good predictions. For one, USDA reporting is different between poultry and cattle, then there’s also wastewater data, human cases reported by the CDC, avian cases reported by U.S. Fish and Wildlife Service, cattle mobility data, bird migration patterns, etc.

And that’s all before looking at how the virus itself behaves. 

“If you don’t have all these layers together, we’re not going to have any idea how this thing is moving around,” he says.

Mechanisms and explanations

Scarpino is also careful to point out that what these AI models lack is a notion of mechanism: “So we wouldn’t be able to tell you,” for instance, “that the specific mechanism is this combination of farms plus birds plus whatever,” he says. AI can only make predictions.

Those predictions, however, “can be operationally quite useful” as researchers and policy makers look for the levers that would shift outbreaks away from becoming epidemics in the first place.

“The leading edge of AI work is on that explainability piece,” he says.

But, crucially, what this all means is that network scientists can’t model what they don’t know. The need for more data sets with greater and greater granularity is high.

Epidemics “are kind of like earthquakes,” Scarpino says, not in their predictability, but that “if you have one of this size” — a massive earthquake, or a global epidemic — “that must mean you have missed” other, smaller earthquakes and outbreaks in between the big ones.

“If that’s the case, the challenge of that surveillance at the very earliest stages is really immense,” he continues. “There’ll be lots of spillovers” from animal into human populations “that are never going to go anywhere.”

“It could be that AI can help us at least answer that question: How frequently are we actually seeing these novel viruses with pandemic potential spilling over into human populations?”

For now, Scarpino says that “we only have guesses, and until we have some solid answers to those questions, we don’t really know how to build the right response system for these threats.”

“Whatever comes along that’s really going to be a big problem is going to be something that we weren’t anticipating,” he says. 

But that very uncertainty can be anticipated, Scarpino notes, and even be included in their models. 

Already, Scarpino says, the modeling of how an infectious disease will spread is more accurate than the weather forecast. Weather forecasts are notoriously unreliable, he continues, yet we nevertheless “live and breathe and make huge decisions with large economic implications associated with these weather forecasts.”

“For disease forecasting, we’re still in our infancy,” he says, but “we have accurate models two to three, sometimes four weeks out, that are operationally relevant for hospitals and other systems in New England.”