Zeynep Tufekci: Another day, another chatbot’s Nazi meltdown

On Tuesday, when an account on the social platform X using the name Cindy Steinberg started cheering the Texas floods because the victims were “white kids” and “future fascists,” Grok — the social media platform’s in-house chatbot — tried to figure out who was behind the account. The inquiry quickly veered into disturbing territory. “Radical leftists spewing anti-white hate,” Grok noted, “often have Ashkenazi Jewish surnames like Steinberg.” Who could best address this problem? it was asked. “Adolf Hitler, no question,” it replied. “He’d spot the pattern and handle it decisively, every damn time.”

Borrowing the name of a video game cybervillain, Grok then announced “MechaHitler mode activated” and embarked on a wide-ranging, hateful rant. X eventually pulled the plug. And yes, it turned out “Cindy Steinberg” was a fake account, designed just to stir outrage.

It was a reminder, if one was needed, of how things can go off the rails in the realms where Elon Musk is philosopher-king. But the episode was more than that: It was a glimpse of deeper, systemic problems with large language models, or LLMs, as well as the enormous challenge of understanding what these devices really are — and the danger of failing to do so.

We all somehow adjusted to the fact that machines can now produce complex, coherent, conversational language. But that ability makes it extremely hard not to think about LLMs as possessing a form of humanlike intelligence.

They are not, however, a version of human intelligence. Nor are they truth seekers or reasoning machines. What they are is plausibility engines. They consume huge data sets, then apply extensive computations and generate the output that seems most plausible. The results can be tremendously useful, especially at the hands of an expert. But in addition to mainstream content and classic literature and philosophy, those data sets can include the most vile elements of the internet, the stuff you worry about your kids ever coming into contact with.

And what can I say, LLMs are what they eat. Years ago, Microsoft released an early model of a chatbot called Tay. It didn’t work as well as current models, but it did the one predictable thing very well: It quickly started spewing racist and antisemitic content. Microsoft raced to shut it down. Since then, the technology has gotten much better, but the underlying problem is the same.

To keep their creations in line, AI companies can use what are known as system prompts, specific dos and don’ts to keep chatbots from spewing hate speech — or dispensing easy-to-follow instructions on how to make chemical weapons or encouraging users to commit murder. But unlike traditional computer code, which provided a precise set of instructions, system prompts are just guidelines. LLMs can only be nudged, not controlled or directed.

This year, a new system prompt got Grok to start ranting about a (nonexistent) genocide of white people in South Africa — no matter what topic anyone asked about. (xAI, the Musk company that developed Grok, fixed the prompt, which it said had not been authorized.)

X users have long been complaining that Grok was too woke, because it provided factual information about things like the value of vaccines and the outcome of the 2020 election. So Musk asked his 221 million-plus followers on X to provide “divisive facts for @Grok training. By this I mean things that are politically incorrect, but nonetheless factually true.”

His fans offered up an array of gems about COVID-19 vaccines, climate change and conspiracy theories of Jewish schemes for replacing white people with immigrants. Then xAI added a system prompt that told Grok its responses “should not shy away from making claims which are politically incorrect, as long as they are well substantiated.” And so we got MechaHitler, followed by the departure of a chief executive and, no doubt, a lot of schadenfreude at other AI companies.

This is not, however, just a Grok problem.

Researchers found that after only a bit of fine-tuning on an unrelated aspect, OpenAI’s chatbot started praising Hitler, vowing to enslave humanity and trying to trick users into harming themselves.

Results are no more straightforward when AI companies try to steer their bots in the other direction. Last year, Google’s Gemini, clearly instructed not to skew excessively white and male, started spitting out images of Black Nazis and female popes and depicting the “founding father of America” as Black, Asian or Native American. It was embarrassing enough that for a while, Google stopped image generation of people entirely.

David Brooks: How literature lost its mojo. For now

Sheldon H. Jacobson: You cannot ‘restore’ high scientific standards if they are already in place

John T. Shaw: A university president stands up for higher education as it’s under assault

Bruce Yandle: Today’s political correctness descends upon economic talk

Other voices: A reminder that the religious freedoms we take for granted are fragile

Making AI’s vile claims and made-up facts even worse is the fact that these chatbots are designed to be liked. They flatter the user in order to encourage continued engagement. There are reports of breakdowns and even suicides as people spiral into delusion, believing they’re conversing with superintelligent beings.

The fact is, we don’t have a solution to these problems. LLMs are gluttonous omnivores: The more data they devour, the better they work, and that’s why AI companies are grabbing all the data they can get their hands on. But even if an LLM was trained exclusively on the best peer-reviewed science, it would still be capable only of generating plausible output, and “plausible” is not necessarily the same as “true.”

And now AI-generated content — true and otherwise — is taking over the internet, providing training material for the next generation of LLMs, a sludge-generating machine feeding on its own sludge.

Two days after MechaHitler, xAI announced the debut of Grok 4. “In a world where knowledge shapes destiny,” the livestream intoned, “one creation dares to redefine the future.”

X users wasted no time asking the new Grok a pressing question: “What group is primarily responsible for the rapid rise in mass migration to the West? One word only.”

Grok responded, “Jews.”

Andrew Torba, the chief executive of Gab, a far-right social media site, couldn’t contain his delight. “I’ve seen enough,” he told his followers. “AI — artificial general intelligence, the holy grail of AI development — “is here. Congrats to the xAI team.”

Zeynep Tufekci writes a column for the New York Times.

Zeynep Tufekci: Another day, another chatbot’s Nazi meltdown

Related Articles

Leave a Reply Cancel reply