Skip to content

New research decodes hidden bias in health care LLMs

Large language models contain racial biases that factor into their recommendations, even in clinical health care settings. Northeastern researchers found a way to reveal these racial associations in LLMs.

Someone in a white doctor's coat types on a laptop. A white overlay displays several health options as icons floating in the air.
Northeastern University researchers employed a tool called a sparse autoencoder to peek under the hood of LLMs employed in health care settings. Getty Images.

Artificial intelligence is increasingly used as a tool in many health care settings, from writing physicians’ notes to making recommendations in specific cases. Research has found that AI and large language models can reflect racial biases present in their training data, which may influence outputs in ways their users may not realize.

New research out of Northeastern University looks past an LLM’s responses to review the data factored into its decisions and decode if race has been problematically deployed in making a recommendation. Employing something called a sparse autoencoder, researchers see a future in which physicians could use this tool to understand when bias is involved in an LLM’s decision-making.

For example, Hiba Ahsan, a Ph.D. student at Northeastern University and the first author on the research paper, notes that prior work has found that Black patients are less likely to be prescribed pain medication even when experiencing levels of pain similar to white patients. An AI ​​model could just as easily make the same biased decisions, Ahsan says. 

Race and health care AI

Sometimes it’s important to factor race into health care decisions. Gestational hypertension, for instance, is more common in people of African descent, according to Ahsan’s paper, while cystic fibrosis is more common in individuals of Northern European descent, according to the Mayo Clinic

Other times, however, biased assumptions about race are baked into the large language model, Ahsan says. A solid body of research, she continues, shows that LLMs exhibit racial bias when used in health care settings, outputting different answers depending on the patient’s race, “Often in cases when it’s not even clinically relevant,” she says.

A woman with long hair, lit by the sunset, stands in front of a lake.
Hiba Ahsan, a PhD. student at Northeastern University, hopes that a new technique could help physicians spot biased behavior in their LLM tools and “either intervene on it and mitigate this behavior, or look at other solutions” like re-training their AI on a new dataset, she says. Courtesy photo.

Ahsan’s intervention was to use a tool called a sparse autoencoder as a kind of intermediary, peering into the murky mathematical middleground where it’s hard, even for computer scientists, to understand how LLMs make their decisions. 

Large language models work to simplify a complex issue by taking an input and putting it through a series of intermediate representations in a process called encoding, Ahsan says. These intermediates represent data compressed to its essentials. 

“The model, in some sense, understands what those numbers mean, but we don’t,” says Byron Wallace, the Sy and Laurie Sternberg interdisciplinary associate professor in the Khoury College of Computer Sciences and Ahsan’s Ph.D. adviser. 

The intermediate representations must then be decoded by the model before an answer can be output.

“We don’t know what those representations represent because of the way that we train neural networks,” Wallace continues. 

The sparse autoencoder, however, can disentangle the intermediate representations into human-legible concepts often called “latents.” A latent can tell a researcher if one data point represents an animal, Ahsan says by way of example, or if another data point refers to a specific race.

To put it another way, Wallace says that the autoencoder will take in a series of numbers that researchers don’t understand, and “try to map that to human understandable concepts.”

When the autoencoder detects a latent related to race, Ahsan says, “It’ll sort of light up and tell me, ‘OK, race is being factored in and it’s informing the output.’”

Ahsan and Wallace took clinical notes and discharge summaries from a publicly available dataset called MIMIC, which strips the documents of personally identifying information. Focusing on notes in which a patient declares themselves to be white, Black or African American, Ahsan and Wallace ran those notes through an LLM called Gemma-2.

Then, using their sparse autoencoder to gather the latents, they trained an algorithm to detect those latents that corresponded to race.

What they found was that racial biases were baked into the LLM. The autoencoder found a high incidence of latents referring to Black individuals alongside “stigmatizing concepts like ‘incarceration’ and ‘gunshot’ and ‘cocaine use,’” Ahsan says.

While evidence of racial bias in LLMs isn’t new, what is new is the ability to peer behind the curtain and see which elements that the LLM has incorporated in structuring its answer.

The problem of interpretability

LLMs operate like black boxes, Ahsan says, and it’s incredibly hard to understand the factors that lead them to make a decision. Using a sparse autoencoder to look under the hood, as it were, could help inform physicians when a patient’s race was being factored into the model’s recommendation. 

That increased visibility could help doctors spot biased behavior and “either intervene on it and mitigate this behavior, or look at other solutions” like re-training their AI on a new dataset or altering the model in other ways, Ahsan continues.

A man wearing glasses stands against a blue background and looks over one shoulder, smiling.
Byron Wallace, the Sy and Laurie Sternberg interdisciplinary associate professor of computer science, says that LLMs operate like black boxes. His and Ahsan’s sparse autoencoder can take in a series of numbers that researchers don’t understand, and “try to map that to human understandable concepts,” he says. Photo by Adam Glanzman/Northeastern University.

Even simply asking the AI, in the original prompt, to return non-biased answers increases the chance of receiving a non-biased response, Ahsan says, but there is still a chance that the answer will be biased. There’s an element of blind faith involved that could make many users, especially in health care environments, uncomfortable. 

While Wallace notes that they didn’t invent sparse autoencoders, they are the first to employ the tool in a clinical setting using physician notes as input.

“If we’re going to use these models,” Wallace says, “in things like health care, and we want to be able to do that safely, we probably need to improve the methods that we have for interpreting them.” The sparse autoencoder method is one step along that road.

Noah Lloyd is the assistant editor for research at Northeastern Global News and NGN Research. Email him at n.lloyd@northeastern.edu. Follow him on X/Twitter at @noahghola.