Skip to content

ChatGPT has a goblin problem. An expert says it’s bigger than an AI quirk.

OpenAI’s popular AI model exhibited a strange behavior: It used the word “goblin” to an unusual degree. The company has since clamped down on the quirk, but an expert says it reveals a systemic issue with AI.

A computer screen displaying two AI-generated goblin images in the ChatGPT interface.
The culprit behind ChatGPT’s obsession with goblins was how AI models are rewarded for certain behaviors, Northeastern’s Christoph Riedl said. Photo by Matthew Modoono/Northeastern University

Starting sometime in November, people who used ChatGPT began noticing some peculiar behavior: the AI chatbot would not shut up about goblins. So, OpenAI, the company behind the chatbot, began looking into it.

The mythical creatures, along with gremlins and others of that ilk, started showing up in metaphors that OpenAI’s chatbot would use in its responses and even in images it would generate from non-goblin related prompts. Over the next few months, the goblins spread. The problem became so pervasive that OpenAI acknowledged it in a recent public post, and an investigation revealed the use of “goblin” had jumped by 175% after the launch of the 5.1 version of ChatGPT in November. 

At the time the company found that the goblins “did not look especially alarming,” it said in a statement, but as newer versions of ChatGPT rolled out, they noticed an even bigger uptick.

For most people, ChatGPT’s abundance of goblins is a benign AI quirk. And the company has since taken measures to crack down on the hordes of goblins in its system, including by issuing a stop gap command that essentially forbids the model from using the word “goblin” in most conversations. 

Northeastern Global News, in your inbox.

Sign up for NGN’s daily newsletter for news, discovery and analysis from around the world.

But technology experts said the glitch reveals cracks in the foundation of how these systems are trained, and how companies are struggling to keep up with the demands of the AI arms race.

“It’s a pressure cooker,” said Christoph Riedl, a professor of computer science, information systems and network science at Northeastern University. “[Companies] are under pressure to release new models. They have limited resources and capacity to test things. The processes are super long and complicated. That’s exactly why you see things like this.”

But where exactly did the goblins come from? According to Riedl, the goblin problem stems from how models like ChatGPt get trained.

Christoph Riedl, Northeastern professor of computer science, information systems, and network science, posing for a portrait against a warm orange-toned background.
ChatGPT’s goblins might seem benign, but they reveal a deeper problem in the pressures AI companies face and how those pressures can let certain behaviors slip through the cracks, said Christoph Riedl, a professor of computer science, information systems and network science at Northeastern University. Photo by Adam Glanzman/Northeastern University

Specifically, according to Reidl, it’s likely a later training stage called fine tuning, where humans provide feedback to the model on the quality of its responses. That quality is subjective: Users might like a response because of its accuracy, tone or how much it reinforces their beliefs.

“These are reinforcement signals [from the users] to the models to say, ‘Hey, if I generate an answer that looks like this, then [you] get positive rewards, and if it’s an answer that looks like this other thing, then there’s less of these rewards,’” Riedl said. 

ChatGPT’s reward system is partially based on its personality customization feature, which lets users choose a tone and style for their version of the chatbot. Personality profiles range from cynical to friendly, but the one that spawned ChatGPT’s goblin problem was the “nerdy” personality, which started using the creatures in metaphors, OpenAI explained in a statement. This version of ChatGPT is designed to be more playful and “tackle weighty subjects without falling into self-seriousness, according to OpenAI’s system prompt for its model. 

Once a model latches onto a rewarded behavior, it will try to “reward hack” as it tries to find shortcuts and generate responses that will get the most rewards. OpenAI might have a broader, richer understanding of what “nerdy” means, but the model “might optimize for it in a very narrow way that’s not at all what you intended,” Riedl said.

And that’s what seems to have happened. Between December and March, mentions of goblins increased by 3,881.4% in responses from the nerdy personality, according to OpenAI.

A horizontal bar chart from OpenAI showing the percentage increase in goblin references across ChatGPT personality profiles between GPT-5.2 and GPT 5.4. The Nerdy personality saw the largest spike at 3,881.4%, followed by Quirky at 737.3% and Cynical at 452.9%. The Professional and Efficient personalities saw minimal change or slight decreases.
The sharpest increase in goblin references occurred in responses from ChatGPT’s “nerdy” personality profile. But there were also small but noticeable upticks in goblin appearances across the other custom personalities too. Credit: OpenAI

But these behaviors can also spread. Once the model starts to see a specific tic being rewarded, even in a specific part of its operations like the nerdy personality, the behavior gets reinforced in later training for the entire model, Riedl said. 

Perhaps unsurprisingly, OpenAI noted that references to goblins started to pop up in ChatGPT’s other personality profiles. 

It’s also why, prior to OpenAI rolling out a command that essentially banned ChatGPT from mentioning goblins, the most recent version of ChatGPT continued to mention goblins, alongside other creatures like gremlins, ogres, trolls, raccoons and pigeons. Most references to frogs were still legitimate, OpenAI stated.

Riedl noted that the way this lexical tic spread reveals a worrying trend. 

He explained that companies will commit an entire datacenter to training their model for months, but have little influence over what happens once the training starts. If goblins or some other unwanted behavior are somehow embedded in that training process, like they seemed to have been here, the company won’t find out until months later.

OpenAI ultimately implemented a quick fix that addressed the issue in the short-term, retiring the “nerdy” personality. But with the demand to create better models more quickly and frequently behaviors like this will continue to slip through the cracks, Riedl said 

It’s created a situation that “every [AI] safety researcher is worried about,” Riedl said, one that, at best, produces goblins. Grok, Elon Musk’s AI chatbot, had its own fixation last year: baseless claims of “white genocide” in South Africa.

“This time it’s goblins and next time it’s something else that will probably just not go away,” Riedl said. “We’re lucky if it’s goblins as opposed to white supremacy or [information on] chemical weapons … or encouraging people to commit suicide.”