Claude Found Every Security Flaw In Its Own Code

The AI that finds the bugs is the same AI that could weaponize them — and Anthropic just proved both sides of that equation work.

The Summary

Anthropic's Claude Mythos Preview found thousands of high- and critical-severity vulnerabilities across every major OS and browser, including a 27-year-old OpenBSD bug, without being explicitly trained for security research.
The company launched Project Glasswing with AWS, Apple, Google, Microsoft, and Nvidia to use Mythos Preview for defensive scanning before attackers can exploit the same capabilities.
The core tension: the same AI reasoning that spots code flaws can also weaponize them, forcing a race between defenders who scan and patch versus attackers who scan and exploit.

The Signal

Anthropic's Frontier Red Team didn't set out to build a vulnerability scanner. They were stress-testing Claude Mythos Preview's reasoning capabilities when the model started finding critical security holes on its own. A 27-year-old bug in OpenBSD that enables remote machine crashes. Browser exploits that break domain isolation. Cryptography library weaknesses that could decrypt supposedly secure communications. Thousands of them.

The list reads like a penetration tester's dream journal. Every major operating system. Every major web browser. The model wasn't trained on security research papers or vulnerability databases. It just reasoned its way to the flaws by understanding how code should work and spotting where it doesn't.

"The model not being explicitly trained for this" is the entire story.

That capability gap just closed. For years, finding novel zero-days required deep expertise, time, and luck. Security researchers built careers on the hunt. Now an AI model with strong reasoning abilities can automate significant portions of that process. The skill ceiling dropped while the output scaled up.

The defensive play: Project Glasswing partners can now scan their code before shipping. Patch faster than the exposure window allows exploitation. In theory, this tilts the advantage toward defenders who control large codebases and can apply fixes at scale.

The offensive reality: Every security researcher and nation-state actor with access to frontier models now has the same capability. The window between "vulnerability discovered" and "vulnerability exploited in the wild" compresses. Instead of months, you get days. Instead of targeted attacks on high-value systems, you get automated scanning of everything exposed to the internet.

Key shifts this creates:

Traditional responsible disclosure timelines (90 days, 6 months) become obsolete when AI can discover and weaponize flaws in hours
Open-source repositories face systematic scanning by both defensive and offensive actors simultaneously
The value of novel zero-days drops as AI floods the market with discovered vulnerabilities

The cybersecurity community's response — "as long as layers of verification are built into the process, and human judgment remains essential" — sounds reasonable until you map it against the economics. Humans don't scale. AI agents do. A single security team can't manually verify thousands of critical vulnerabilities while also patching them before automated exploit generation catches up.

The Implication

Watch two things. First, how fast the Project Glasswing partners actually implement systematic scanning and patching workflows. If they're serious, you'll see security update cadences accelerate within months. If this is defensive PR, the vulnerability disclosure rate stays flat.

Second, monitor the exploit development timeline. If the time between CVE publication and working exploit code starts dropping from weeks to days, that's your signal that offensive AI automation is scaling faster than defensive human processes.

The companies building Web4 agent infrastructure need to assume their code is being scanned by adversarial AI right now. The old security model — obscurity buys time, disclosure buys patches — just broke. The new model is continuous automated defense versus continuous automated offense. Human expertise doesn't leave the loop, but it becomes the bottleneck that determines whether you're fast enough to matter.

Sources

IEEE Spectrum AI

Claude Found Every Security Flaw In Its Own Code — Then Exploited Them

The Summary

The Signal

The Implication

Sources

Keep Reading