Anthropic Built an AI Hacker Too Dangerous to Release

Anthropic just built an AI so good at hacking that they won't release it to the public.

The Summary

Anthropic's Claude Mythos Preview can autonomously find deep security vulnerabilities in code and build exploits to gain admin access, making it too dangerous for public release.
Instead of burying it, Anthropic launched Project Glasswing, giving select cybersecurity researchers early access to harden critical software before similar models emerge.
The race is on: will defenders patch faster than attackers can build their own exploit-finding agents?

The Signal

This is the inflection point where AI agents stop being productivity tools and start being weapons. Mythos doesn't just suggest where vulnerabilities might be. It finds them, understands their context within complex systems, and builds working exploits. That's not a research assistant. That's an autonomous penetration tester that doesn't sleep, doesn't miss edge cases, and scales infinitely.

The strategic move here isn't the model. It's Project Glasswing. Anthropic knows they can't keep this capability bottled up forever. Other labs are running the same experiments. State actors definitely are. So they're giving defenders a head start, letting cybersecurity teams at major companies and institutions use Mythos to find and patch vulnerabilities before the bad guys get similar tools.

"When AI can find vulnerabilities at a speed and depth that materially changes how quickly weaknesses can be identified, it fundamentally accelerates the discovery of issues across both new and existing systems."

Marcus Fowler from Darktrace Federal gets it. This isn't about one clever model finding one clever bug. This is about:

Scale: AI can audit entire codebases in hours, not months
Depth: It catches interactions between systems that human auditors miss
Persistence: Legacy code that's been deployed for years gets reassessed continuously

But here's the uncomfortable truth: Anthropic is betting that concentrated early access creates a meaningful defensive advantage. That only works if two things are true. First, that the researchers they've selected can actually patch critical systems faster than attackers can independently develop similar capabilities. Second, that those patches get deployed before exploit-finding agents become commoditized.

History suggests both assumptions are fragile. Security patches sit undeployed for months or years in enterprise environments. Software supply chains are byzantine. And AI capabilities that seem frontier-exclusive today become open-source adjacent within 18 months. The window where defenders have exclusive access to this class of tool is measured in quarters, not years.

Dean Ball's optimism about "major achievements in the history of cybersecurity" assumes the hardening happens comprehensively and fast. But software runs the power grid, financial systems, hospitals, and every connected device you own. The attack surface is functionally infinite. Mythos might find 10,000 critical vulnerabilities. That's only useful if those 10,000 things actually get fixed before someone else builds their own exploit-finder and picks the 10,001st.

The Implication

If you run security for anything that matters, get in front of this now. Project Glasswing access is limited, but the companies that get early looks at what Mythos finds will have months to patch before this capability goes wide. More importantly, start planning for a world where every sophisticated attacker has access to autonomous exploit discovery within two years.

For everyone else, this is a preview of the agent security economy. The companies building defensive agents will be as critical as the ones building firewalls were in 2000. Watch who Anthropic partners with through Glasswing. Those relationships will define the next generation of enterprise security infrastructure.

Sources

Fast Company Tech

The Summary

The Signal

The Implication

Sources

Keep Reading