Professor designs artificial intelligence to help doctors make better treatment decisions

“The end game is that we would really like for physicians sitting in their offices working with a patient to immediately have access to structured, compiled evidence from all of the trials that might be relevant to the patient’s needs,” said Byron Wallace, a machine learning expert and creator of RobotReviewer. Photo by Adam Glanzman/Northeastern University
“The end game is that we would really like for physicians sitting in their offices working with a patient to immediately have access to structured, compiled evidence from all of the trials that might be relevant to the patient’s needs,” said Byron Wallace, a machine learning expert and creator of RobotReviewer. Photo by Adam Glanzman/Northeastern University

The medical field has a data problem. The issue isn’t a lack of data, but rather a lack of structure. Every year researchers publish thousands of studies describing the results of clinical trials. But there is no easy way to sift through them all.

“I believe that medicine should be data-driven,” said Byron Wallace, assistant professor in the College of Computer and Information Science at Northeastern. “If it’s not based on what the evidence says, treatment decisions are based on folklore.”

Wallace, a machine learning expert, is developing a tool called RobotReviewer that seeks to make mountains of data from research studies more accessible to healthcare providers. It uses machine learning algorithms and natural language processing models to automatically crawl through and make sense of scientific literature.

Number of registered NIH studies over time

Data source: ClinicalTrials.gov. Graph by Lia Petronio/Northeastern University.

RobotReviewer is funded by a grant from the National Institutes of Health’s Big Data to Knowledge program. Currently, the platform can analyze a few articles at a time, assessing the robustness of the findings and providing some clinically relevant information.

For example, the RobotReviewer can detect whether a study included randomized control trials. The algorithm also checks for blinding—a study design feature in which the leading researcher doesn’t know which group of participants received the treatment being tested and which received the placebo.

Blinding and randomized control trials are two of the most important measures of scientific rigor for clinical studies. If RobotReviewer determines a study doesn’t meet these criteria, physicians may not want to take those findings into account as they recommend treatment for their patients.

Wallace is working with Iain Marshall, a general practitioner based in London, to develop the software.

“Iain finds himself struggling when he’s practicing because if a patient comes in with a particular condition, it’s really hard for him to say, ‘Here’s what the evidence actually says for your case,’” Wallace said. That’s because of the sheer amount of evidence available, and the lack of structure in the data, which makes it impossible to efficiently comb through.

The way RobotReviewer analyzes published studies is similar to the way an email client distinguishes spam from the rest of your messages.

“To decide whether an email should go to your spam filter, the software basically discovers correlations between words and frequency of emails being spam,” Wallace explains. The email client might be programmed to link the words, “flash sale” or “limited time only” to an instance of spam.

RobotReviewer might link the words “double-blind” or “randomized” to salient information about clinical studies. But to do that, a person first must label those words so the algorithm can learn to detect them.

5,620
registered studies on Lymphoma since 2000

RobotReviewer will ultimately scan the entire evidence base to find studies relevant to a patient’s condition.

There are thousands of published clinical studies for a given condition, and no efficient way for physicians to analyze them all. Eventually, the goal is to expand RobotReviewer so it can comb through the entire evidence base of studies related to a disease, such as lymphoma, and glean salient data to help doctors make decisions. Data source: ClinicalTrials.gov. Graph by Lia Petronio/Northeastern University.

The challenge is that people with the expertise to understand medical studies—primarily doctors—don’t have time to devote to labeling data. But Wallace and his collaborators are trying an innovative approach to solving this problem: crowdsourcing citizen science. Wallace received another grant that funds his use of an online platform called Mechanical Turk, which pays lay people to complete certain tasks—in this case, reading technical articles and biomedical abstracts and labeling the information of interest, like mentions of drugs or interventions.

“We’re looking to see if we can find people who are really interested in particular conditions because perhaps a loved one has it,” Wallace said. “Maybe they would be sufficiently motivated and would want to read medical literature anyway and mark some data for us so we can train our models better.”

The ultimate goal is to expand RobotReviewer’s capabilities so the algorithm could analyze the entire evidence base and provide guidance for physicians at the bedside. While it won’t be quite like Amazon’s Alexa for doctors, the program could similarly provide useful information and recommendations on command.

“It’s clear that we’re not going to just provide an answer and tell them to trust us,” Wallace said. “But the end game is that we would really like for physicians sitting in their offices working with a patient to immediately have access to structured, compiled evidence from all of the trials that might be relevant to the patient’s needs.”