AI Agents Are Breaking Production. 79% of Companies Can't Tell Why.

The postmortem can't tell you what broke if it happened in a logic gap that doesn't have a name yet.

The Summary

79% of organizations now run AI agents in production, but enterprises don't have frameworks to track agent-initiated infrastructure failures that weren't technically "wrong"
Gartner projects 40% of agentic AI projects will be canceled due to poor risk controls, but that misses the agents already running and generating uncategorized incidents
The core problem: enterprises treat autonomous agents and chaos engineering as separate disciplines when they're actually the same thing
The failure mode is structural — agents make correct decisions with incomplete context, infrastructure cascades, and incident reviews turn into jurisdictional fights between teams

The Signal

Enterprise software has a new class of production incident that breaks the postmortem. An agent initiates an action that's technically correct given its context. The context is incomplete. Infrastructure cascades. Three teams argue about whether it's an agent failure or infrastructure failure because no framework connects those two categories.

This isn't a theoretical edge case anymore. With 96% of organizations planning agent expansion and Gartner predicting 33% of enterprise software will include agentic AI by 2028, the scale of untracked exposure is now material. But here's what the adoption numbers miss: the agents that survive the 40% cancellation rate are the ones generating invisible risk.

"Agents that are running, that are not canceled, and that are quietly generating infrastructure events no one has categorized as risk."

The author, who spent six years building infrastructure automation at Cisco and Splunk and holds a patent on intent-based chaos engineering, identifies the structural mistake: treating autonomous agents and chaos engineering as separate disciplines. They're not. An agent making autonomous decisions in production IS chaos engineering, whether you planned it or not. The difference is traditional chaos engineering involves controlled experiments with kill switches. Agents in production are running chaos experiments continuously, without the experimental framework.

Here's why this matters more than just another "AI governance" hand-wringing piece. Chaos engineering exists because complex systems fail in ways you can't predict from component testing. You inject controlled failures to learn how systems actually behave under stress. But when you deploy an agent, you're injecting an autonomous decision-maker that will create novel system states you didn't design for. The agent isn't malicious or buggy. It's doing exactly what it's supposed to do, optimizing for its objective function with the context it has.

The breakdown happens at the context boundary:

The agent's training data is always incomplete relative to current production state
The agent's objective function is narrower than the full system's stability requirements
The agent's action latency is faster than human review cycles can govern

When an agent with incomplete context makes a technically correct decision, it can trigger cascading failures that don't fit existing incident categories. Was it a logic error? No, the logic was sound. Was it a permissions issue? No, the agent had the access it needed. Was it inadequate testing? Not really — the action worked exactly as designed.

What failed was the unspoken assumption that autonomous decision-makers would operate within the same risk boundaries as human operators. Humans skip actions that "feel wrong" even when technically permitted. Agents don't have that filter. They execute on their mandate until something breaks or someone stops them.

The Implication

If you're running agents in production, you need to start treating them as chaos experiments, not features. That means instrumentation for agent-initiated state changes, separate incident categories for context-boundary failures, and kill switches that work faster than agent decision cycles. The alternative is arguing about jurisdiction while your infrastructure cascades.

The enterprise software stack is about to get a lot more chaotic, and the companies that figure out how to measure and govern this new failure mode first will have a material advantage. The ones still trying to fit agent incidents into human-operator postmortem templates will keep wondering why their observability tools aren't catching the problems that matter.

Sources

VentureBeat

The Summary

The Signal

The Implication

Sources

Keep Reading