Anthropic's Leaked Mythos Model Makes Hacking Ridiculously Easy

Anthropic just accidentally told us AI offense is about to outrun AI defense.

The Summary

Anthropic leaked a draft post revealing their upcoming "Mythos" model could let hackers exploit vulnerabilities faster than defenders can patch them
The model excels at generating and reviewing code, a capability that cuts both ways in security
Anthropic is pre-briefing cybersecurity researchers and granting early access to try to reduce exploitation potential before general release

The Signal

This is the moment the AI safety conversation gets concrete. For two years, we've heard abstract warnings about model capabilities and alignment. Now we have a major lab accidentally publishing the quiet part out loud: their next model will make hacking easier than defending.

The specifics matter here. Better code generation and review means AI can now scan for vulnerabilities, write exploits, and iterate on attack vectors at machine speed. A human security researcher might find and weaponize one zero-day vulnerability in a month of focused work. An AI agent could potentially test thousands of attack surfaces in the same timeframe. The asymmetry is brutal. Offense scales. Defense doesn't.

Anthropic's response is telling. They're not holding back the model. They're trying to red-team it before release, bringing in outside researchers to stress-test safeguards. This is the right move, but it reveals the core problem: there's no amount of pre-release hardening that can prevent a sufficiently motivated actor from jailbreaking or fine-tuning their own version once the underlying architecture is out. The genie doesn't go back in the bottle.

The leaked post itself is almost a Freudian slip. Anthropic has been positioning itself as the safety-conscious AI lab, the ones doing constitutional AI and careful deployment. But the market wants capability. Every lab is in a race to ship the model that can actually do complex technical work, and complex technical work includes exploiting systems. You can't build an AI that's great at finding bugs for defenders without also building one that's great at finding bugs for attackers.

The Implication

If you're running infrastructure, this is your warning shot. The next 18 months will see a spike in AI-assisted intrusions. Budget for it now. Automate your defenses where you can, because human-speed patching won't keep up with machine-speed exploitation.

For the rest of us, watch what Anthropic does after the red-teaming. If they ship Mythos with the same capabilities they're worried about, that tells you everything about how this industry prioritizes growth over safety when the pressure is on. The real test isn't what they say in blog posts. It's what they release.

Source: The Information