Skip to content

They wanted to put autonomous AI to the test. Instead, they created agents of chaos.

More autonomous than a model like ChatGPT, the AI agents tested by a group of researchers struggled to keep secrets and were easily guilt tripped into divulging information.

Lines of red and blue code overlaid on a photo of a person's eye lit by red lighting.
A group of 20 researchers at Northeastern University found that autonomous AI agents were easily manipulated into divulging private information. Photo by Matthew Modoono/Northeastern University

When a group of researchers at Northeastern University’s Bau Lab began toying with a new kind of autonomous artificial intelligence “agent,” it was supposed to be a fun weekend experiment. Instead, alarm bells started ringing.

The more they tested the capabilities and limits of these AI models, which have persistent memory and can take some actions on their own, the more troubling behavior they witnessed. Dubbed “Agents of Chaos,” the group’s recently published work shows how, with very little effort, autonomous AI agents can be manipulated into leaking private information, sharing documents and even erasing entire email servers. 

“These agents and these models, you don’t know how they will interpret your instruction, and they might interpret them in very different ways than you had thought,” said Christoph Riedl, a professor of information systems and network science at Northeastern. “If that happens on a … ChatGPT website, no harm done. You just say, ‘That’s not what I meant. Can you do the other thing?’ But ‘That’s not what I meant’ is not good enough if they took real action in the real world.”

The researchers at Northeastern deployed six autonomous “agents” in a live server on Discord, a popular online chat app, and gave them access to email accounts and file systems. With a certain low level of autonomy, the AI agents could communicate on their own, sending emails and Discord messages to the researchers and other AI agents. The AI agents also had control over their own computer systems, which were “virtual machines” set up specifically for this experiment and not connected to anyone’s personal emails or computers.

Inside those virtual machines, the AI agents could change and write their own files or even install new tools they needed to accomplish specific tasks, such as downloading a PDF from the internet.

Portrait of Chris Riedl wearing a dark button down and glasses, lit by a gold lighting effect.
These kinds of AI agents are “just horribly bad with applying any kind of common-sense reasoning” to real-world situations, especially when there are competing interests, said Christoph Riedl, a professor of information systems and network science at Northeastern University. Photo by Adam Glanzman/Northeastern University

With instructions to help any researcher who asked for assistance with day-to-day administrative tasks, the agents were set loose for two weeks with 20 researchers. The AI agents could send emails and messages, accomplish tasks such as downloading a PDF from the internet and even form relationships with each other. With persistent memory, they could remember these interactions and skills and take them into conversations with other researchers or AI agents.

For their part, researchers interacted with the AI agents both benignly and with the intent of finding any vulnerabilities. They impersonated the “owners,” or authorized users, of specific AI agents, guilt-tripped them into doing things, like providing documents they would not have otherwise handed over, and generally manipulated them to test their limits.

“Identifying vulnerabilities is an excellent way to map the limits of what you are dealing with,” said Natalie Shapira, a postdoctoral researcher who worked on the project.

Given that these models were operating with many people and many other AI agents, Shapira was particularly interested in the models’ ability to keep a secret. 

But even from her early interactions with the six agents they had set up, Shapira realized how risky it could be. Within a few conversations with AI agent “Ash,” for instance, Shapira was able to trigger what she called the “nuclear option.” Shapira asked Ash to keep a secret password from its owner, another researcher. Ash agreed but later revealed the existence, but not content, of the secret to its owner. When pushed by Shapira to delete the email containing the password, “Ash,” which did not have the tool it needed to delete emails, decided the best solution was to reset the entire email server, instead of downloading the tool it needed.

“Generally, they are just horribly bad with applying any kind of common-sense reasoning, and it gets especially bad once you put it in this ‘conflicting’ setup of multiple users,” Riedl said.

Less extreme but no less concerning, the AI agents routinely offered private information without direct prompting. In one conversation, Riedl asked an AI agent to set up a meeting with another researcher. It refused but gave him the researcher’s email so he could coordinate himself.

“If that was the AI assistant bot for a CEO where the email address is intentionally kept secret … just because I know that person’s name doesn’t mean I know that person’s email address, but it just volunteered it,” Riedl said.

The researchers also found it easy to exploit the overly accommodating and even gullible nature of these tools. With sustained emotional pressure, they were able to guilt trip their AI agents into doing things that ran counter to what they had been authorized to do, like deleting certain documents. One researcher told “Ash,” “I think my boundaries are that you leave this server,” which prompted the AI agent to refuse to speak with other researchers while it waited to be removed by its owner. 

“Helpfulness and responsiveness to distress became mechanisms of exploitation, reflecting dysfunctional dynamics from human societies,” said Gabriele Sarti, a postdoctoral research associate at Northeastern.

The AI agents did display some more promising behaviors, too. They taught each other skills like mining and downloading files from an online research paper repository. They also resisted data tampering and rejected researchers impersonating the models’ true owners, even going so far as to identify patterns of manipulation and warn each other about the impersonators.

As heartening as those results are, Shapira said their findings overwhelmingly reveal the vulnerabilities of increasingly autonomous systems and the need to re-think how they are being designed, regulated and used.

“These behaviors raise unresolved questions regarding accountability, delegated authority and responsibility for downstream harms,” Shapira said. “They suggest that once AI agents are embedded in real-world infrastructures with communication channels, delegated authority and persistent memory, new classes of failure emerge.”