The next trillion-dollar question in enterprise AI isn't which model is smartest — it's who decides when your agent should stop working.
The Summary
- Microsoft and OpenAI lead enterprise agent orchestration at 38.6% and 25.7% adoption, while Anthropic entered at 5.7% in February, according to VB Pulse Enterprise Agentic Orchestration tracker data
- Production agent failures happen when models decide they're done before actually finishing the task, not from lack of capability
- Anthropic's /goals feature splits task execution from task evaluation using a second evaluator model, while OpenAI and Google require developers to build their own termination logic
- The strategic shift moves from model performance wars to control over the infrastructure layer where agents plan, execute, and prove compliance
The Signal
For two years, enterprise AI buyers compared model benchmark scores. Now they're asking a different question: who controls the control plane. The agent orchestration layer is where agents plan workflows, call tools, access data, and generate audit trails for security teams. This is infrastructure, not intelligence, and the companies that own it will capture more value than the ones with slightly better reasoning scores.
Microsoft Copilot Studio and Azure AI Studio captured 38.6% of enterprise orchestration adoption in February, up from 35.7% in January. OpenAI's Assistants and Responses API held 25.7%, up from 23.2%. Those numbers reflect installed base advantage. If you already run on Azure or use ChatGPT Enterprise, the orchestration layer comes bundled. Anthropic doesn't have that luxury.
"Anthropic remained far smaller, but it made its first appearance in the tracker: moving from 0% in January to 5.7% in February."
Here's what makes that 5.7% interesting. Anthropic's entry came through tool use and workflows, not model quality. They're selling orchestration capability, not just a better chatbot. And they're solving a problem Microsoft and OpenAI have mostly ignored: agents that quit too early.
A code migration agent finishes its run with a green pipeline, but several pieces never compiled. It takes days to catch. That's not a model failure. That's a termination logic failure. The model decided it was done before it actually was. This happens across vendors, but the approaches differ sharply:
- OpenAI lets the model decide when to stop, then lets users add evaluators after the fact
- LangChain and Google's Agent Development Kit allow independent evaluation, but developers write the critic node, termination logic, and observability config themselves
- Anthropic's /goals formally separates execution from evaluation using a second model
The difference matters in production. Claude Code /goals adds an evaluator model that reviews every step and decides if the goal has been achieved. The executing agent keeps working. The evaluator decides when to stop. This is architecturally cleaner than bolting evaluation onto a single-model loop, and it's the kind of thing that wins enterprise deals when your agents are touching production databases or managing compliance workflows.
The Implication
If you're building agent systems in-house, watch termination logic. The model won't tell you it quit early. It will just stop, and you'll find out later when something breaks. The /goals pattern works because it externalizes the decision to stop, which means you can tune, audit, and override it.
For buyers, the orchestration layer is now the lock-in layer. Once your agents run on Azure AI Studio or OpenAI's Assistants API, switching costs go up fast. Anthropic is betting that better orchestration primitives can overcome Microsoft and OpenAI's distribution advantage. The early adoption numbers are small, but the direction is clear: the agent control plane is the next strategic choke point, and whoever owns it will define how enterprises build in Web4.