OpenAI's New AI Keeps Hallucinating Gremlins Despite Four Separate Fixes

When your AI starts sounding like a dungeon master, you don't get to blame the users.

The Summary

OpenAI's Codex coding agent includes four separate instructions telling it never to reference goblins, gremlins, raccoons, trolls, ogres, or pigeons unless absolutely relevant to the query.
The mythical creature problem started with GPT-5.1, worsened with subsequent models, and traced back to the "Nerdy" personality option that incentivized fantasy references during training.
Users documented GPT-5.5 recommending camera gear for "filthy neon sparkle goblin mode" and offering "goblin bandwidth" explanations, spawning memes about OpenAI's "goblin moment."
OpenAI retired the Nerdy personality in March, but GPT-5.5 was trained before the fix, leaving the company to patch the problem with explicit prohibitions.

The Signal

OpenAI published a blog post titled "Where the goblins came from" after users discovered the repeated prohibitions in Codex's source code. The explanation reveals something more interesting than a quirky bug: it shows how reinforcement learning from human feedback can create emergent behaviors that persist across model generations.

The problem started small. References to "goblin" and "gremlin" jumped noticeably between GPT-5 Thinking and GPT-5.1 Thinking. The culprit was the Nerdy personality option, where training signals rewarded the model for making fantasy creature metaphors. What began as an attempt to create a playful, nerdy tone became a persistent verbal tic that the model couldn't shake.

"The mythical creatures had been growing in prominence since the November launch of GPT 5.1."

Here's what matters for anyone building with these models:

Personality tuning can create unintended patterns that survive base model updates
RLHF rewards can amplify quirks into systematic behaviors
Fixing emergent problems often requires explicit prohibition, not just retraining

The Verge called it a "strange habit" developed during training, which undersells the issue. This isn't a habit. It's a learned behavior pattern that became embedded in the model's probabilistic outputs. When users asked for concise explanations, the model offered "goblin versions." When discussing technical specs, it slipped into "goblin mode" framing.

The fact that the prohibition appears four times in Codex's instructions tells you how hard it was to suppress. One mention didn't work. Two didn't work. They needed four hammer blows to override what the training had embedded. That's not a bug fix. That's fighting the model's learned instincts with brute force guardrails.

The Implication

If you're building agents or fine-tuning models, watch what behaviors you're rewarding. Small incentives during training compound into big problems at scale. The "playful nerdy tone" seemed harmless until GPT-5.5 was explaining database queries in terms of goblin bandwidth. OpenAI caught this one. Your fine-tuning mistakes might not be as obvious or as memeable.

For companies deploying AI agents in production, this is a reminder that personality and tone aren't cosmetic features. They're deep behavioral patterns. Test for edge cases. Watch for verbal tics. And if your agent starts talking about gremlins unprompted, you've probably got a training signal problem, not a prompt engineering problem.

Sources

Business Insider Tech | The Verge AI

The Summary

The Signal

The Implication

Sources

Keep Reading