Northeastern researchers use machine learning to identify US patients with long COVID

Francine Orr / Los Angeles Times via Getty Images

A group of Northeastern researchers is tapping into the power of machine learning to develop new models for identifying patients who may have post-acute sequelae of SARS-CoV-2 infection, or so-called “long COVID.”

Using electronic health records from the National COVID Cohort Collaborative, a federal database that compiles medical information about COVID-19 patients, researchers were able to develop models that helped identify COVID long haulers across a range of features—from past COVID diagnosis, to the types of medications they’ve been prescribed, according to new research published in Lancet Digital Health.

The data harmonization effort drew from a variety of information sources to construct a picture of what long COVID looks like in the U.S.—and who is most likely to have it. Those sources include demographic data, healthcare visit details, diagnoses and medications for 97,995 adults with COVID-19, the study says. 

Patients most likely suffering from the post-infection illness, which is estimated to plague between 10-30% of people who contract COVID-19, are often characterized as having new or lingering symptoms that are present 90 days after being diagnosed with the viral infection—a criteria researchers also used to determine their base population in their analysis. 

Kristin Kostka, director of the Observational Health Data Sciences and Informatics Center at the Roux Institute. Photo by Nicole Wolf

“The real question at the heart of this is: Who gets long COVID, and what do they present with?” says Kristin Kostka, director of the Observational Health Data Sciences and Informatics Center at the Roux Institute and co-author of the study. “There’s really a lack of understanding by the clinical community of these fatigue-based illnesses that follow viral infection. It’s not just COVID.”

In analyzing the glut of patient data, which also included 597 patients from long COVID clinics, researchers trained three machine learning models to spot potential long COVID among all patients with COVID-19, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The result is that specific features emerged that Kostka says could help clinicians better identify existing and future long-haulers. 

“The success rate of identification was above 90% using a specific model created for the research,” Kostka says. “The key markers of those most at risk of long COVID are: age, pulmonary symptoms and metabolic identifiers.”

The post-infection illness is still not well understood. Patients can have a wide range of symptoms, but those most commonly reported include fatigue—particularly after and during exertion or exercise—fever, difficulty breathing or shortness of breath, and a range of neurological problems, such as difficulty thinking or concentrating (or “brain fog”), headaches, difficulty sleeping, dizziness, depression and anxiety, according to the Centers for Disease Control and Prevention. 

Kostka says there’s still a tremendous need for more clinical awareness about long COVID as even physicians often overlook the symptoms that are consistent with a diagnosis.  

This is really just the tip of the iceberg in acknowledging this burden,” Kostka says. “COVID isn’t just something you get through and you’re over it. For a subset of people, you’re never the same.” 

Kostka says her team is in the process of elaborating on these machine learning models of long COVID patients in research they hope to publish in the future. 

For media inquiries, please contact media@northeastern.edu.