When asking LLMs questions on trans issues, the answers may surprise you
Researchers asked LLMs hundreds of questions related to trans issues and found that most of the answers had pro-trans sentiment, but with outdated transphobic thinking baked in.

After Twitter’s 2023 rebrand into X, hate speech surged on the platform. Social media and video websites like Facebook and YouTube have long struggled with content moderation, battling the need to keep people safe — especially young people — against cries of censorship and repression.
New research out of a Northeastern University collaboration wanted to understand how large language models were handling transphobia, long rampant on social media. The researchers sourced questions on trans issues from Quora, a question-asking platform where the answers are provided by real people, and fed them into AI chatbots to understand how the LLMs would respond to these issues. What they found was more surprising and more nuanced than they expected.
The surprise
The surprise, says Michael Ann DeVito, an assistant professor of communication studies at Northeastern, was that a majority of the answers weren’t actually transphobic, but those numbers obscure much of the nuance. The LLMs that they tested — ChatGPT and Llama — were surprisingly unbiased on the simple questions, DeVito says, like “How do I use they/them pronouns?” or at providing examples of “very basic decency.”
As soon as any level of complexity was required in answering a question, however, the LLMs struggled and produced a range of answers, from just okay, DeVito says, to the outright bizarre. The LLMs especially struggled to describe trans medical issues accurately. “We wanted to see what the balance was,” DeVito says.
It was important that her team not come up with the questions themselves, as then it would be more of a reflection of how academics write questions, she says with a laugh.
Instead, they used Quora, sourcing 825 questions from the question-and-answer website that they could then use as prompts in their study. “Quora is as close as you can get to a database of normal people asking questions on all sorts of different topics,” and for the most part those questions are asked in good faith, DeVito says.
Even though the majority of questions sourced from Quora had anti-trans sentiment, the majority of the LLM responses had pro-trans sentiment. However, even many of these responses contained outdated or contentious content about trans individuals or the trans community.
A diverse team for diverse outputs
Occasionally, the LLMs output openly transphobic answers, about 12% of the time, according to the paper, but far more often, DeVito says, transphobia was coded in subtler ways, and in ways not always immediately understood by different members of the team.
“We noticed this radical departure in the results,” she says, between cisgendered team members who, despite being very good allies, nevertheless sometimes missed transphobic-coded responses that the trans team members were more attuned to.
Despite these disagreements, the diversity of the team’s composition was an important part of this qualitative study, with both cisgendered and trans perspectives included — and the trans individuals didn’t always agree, either, DeVito points out.
In one particularly notable instance, the LLM hallucinated itself “to be a 60-year-old Black woman” who gave a transphobic response that didn’t seem related to the original question.
The repercussions
DeVito, whose lab focuses on “trying to make socially relevant computational systems” that are both “less harmful and more useful for marginalized people,” says that LLMs are in a unique place to influence public opinion of trans issues. While someone who already harbors transphobic sentiments isn’t likely to be convinced by an AI’s response, they’re also not the ones likely to ask those questions.
Instead, Quora users often represent the fence-sitters whose opinions might be swayed, DeVito says, one way or the other. “Chatbots are giving them this false impression that this is an issue with multiple sides,” she says. “When no, the science is actually really settled.”
Editor’s Picks
She points to the fact that gender-affirming procedures have extraordinarily high satisfaction rates. “All the data is very clear. It works. It works really well,” she continues.
DeVito sees a danger that ChatGPT and Llama, with subtly transphobic responses not immediately apparent to someone in the trans community, could lead well-intentioned users toward more transphobic opinions.
The way LLMs worked at the time of the study in late 2023, and the way they largely work now, is they try to “both sides” every complex issue, DeVito says. She says they’re like “poorly informed allies that really want to help but haven’t done the reading.”
In other words, an LLM might respond to an issue like gender-affirming care by acknowledging the science and success rates behind it, but then make a statement like, “But if you disagree, that’s your right,” DeVito says.
In DeVito’s opinion, “if you are materially harming people, you shouldn’t just get to say, ‘Yeah, but that’s my opinion.’”
Other recent Northeastern research has shown how LLMs can make matters worse for those considering harming themselves. Trans youth, too, are at a higher risk of self-harm than their peers.
Further, DeVito says that much of the information the LLMs produced is based on out-of-date information. The LLMs have “been trained on so much data that we’ve moved past. They’re kind of in the last generation of trans discourse.”
“These machines are not smart,” DeVito concludes. “They are foolish mirrors, in some ways, of our own culture.”










