The agents shipped in 2024 are already legacy code.

The Summary

  • Enterprises are hitting a wall with first-gen AI agents: workflows crash, state disappears, costs spiral, and recovery doesn't exist.
  • Temporal Technologies reports customers rebuilding agents from scratch because "they didn't take care of the plumbing."
  • The problem isn't model performance. It's that production agents run for hours or days, call multiple services, and fail in ways no one planned for.

The Signal

The first wave of enterprise AI agents was built on optimism and speed. Teams raced to ship proof-of-concepts, demo workflows, and show stakeholders that yes, an LLM could draft emails or summarize documents. What they didn't build: the infrastructure to keep those agents alive when things go wrong.

Now the bill is coming due. According to Preeti Somal, Senior VP Engineering at Temporal Technologies, enterprises are returning to rebuild version 2.0 of the same agents. The issue isn't that the models got worse. It's that production reality is different from a demo. Agents that worked fine in controlled environments collapse when they hit real workflows spanning hours, multiple API calls, external systems, and inevitable failures.

The problem compounds because agentic workflows aren't simple request-response cycles. A single agent might:

  • Call three different LLMs for reasoning, classification, and generation
  • Query a vector database for retrieval
  • Trigger external APIs for calendar updates, CRM writes, or payment processing
  • Maintain state across all of this for hours or days
"People will write agents but haven't thought about what happens if the agent crashes."

When any piece fails, the entire workflow can become unrecoverable. No checkpoint. No state persistence. Just a silent failure and a confused user wondering why their agent never finished the task. Temporal's infrastructure, originally built for durable workflow orchestration before the current AI wave, is now handling the unsexy work that makes agents production-ready: state management, crash recovery, observability into multi-step processes, and governance over what agents can actually do.

This isn't a new engineering problem. Distributed systems have dealt with durability and failure recovery for decades. What's new is the scale and complexity introduced by agentic AI. Traditional workflows had predictable steps. Agent workflows are non-deterministic, context-dependent, and often span services the agent discovers mid-execution.

The rebuild era matters because it separates companies serious about agents from those treating them as novelties. First-gen implementations were about speed. Second-gen is about survival. The companies getting this right aren't asking "can we build an agent?" They're asking: can this agent run for three days without human intervention, survive a model API going down, resume from the exact step it failed at, and still cost less than hiring someone?

The Implication

If you shipped agents in the last 18 months, ask what happens when they crash. If the answer is "we restart them manually" or "we don't know," you're building version 1.0. The companies winning the agent economy in 2027 won't be the ones with the best prompts. They'll be the ones whose agents actually finish what they start.

Watch for orchestration and observability tools to become the picks and shovels of the agent era. Model performance is table stakes. Reliability is the moat.

Sources

VentureBeat