The teams building with AI agents just got their first real infrastructure layer — and it's learning which models to use faster than you can decide yourself.
The Summary
- Weave released an open-source model router that plugs into coding agents (Claude, Cursor, Codex) and automatically routes requests to the optimal model for each task
- The router uses a reinforcement learning model trained on tens of thousands of agent traces to decide when to use expensive frontier models versus cheaper alternatives
- Weave cut their own AI coding costs by 40% with no quality loss after running it internally for a month
The Signal
This is the first serious piece of agent infrastructure that isn't just wrapper plumbing. Weave built this because they had to. When Opus 4.7 shipped with tokenizer changes, their AI coding bills spiked. Not "we noticed an uptick" spiked. More like "this became a real line item" spiked.
The problem they're solving is universal for anyone building with agents: you need the smart model for planning and architecture decisions, but you're wasting money using it to fetch file contents or run simple transformations. The obvious solution is manual model selection. The actual solution is that nobody has time to micromanage which model handles which subtask in a multi-step agent workflow.
"We trained an RL model on tens of thousands of agent traces. We reward the routing model when it selects an LLM that successfully completes the given task."
Key mechanics:
- Complex planning requests go to Opus 4.8 or GPT 5.5
- Context-gathering subagents get routed to DeepSeek V4 Flash
- Implementation tasks land on GLM 5.2 or similar mid-tier models
- The router handles all translation between different model APIs
The RL approach is the interesting part. This isn't just pattern matching on prompt length or keyword detection. It's learning from actual outcomes. When a cheaper model successfully completes a task that looks complex, the router learns. When a task fails and gets escalated to a frontier model, it learns that too. Over tens of thousands of traces, it builds intuition about what model intelligence is actually required for each type of agent subtask.
Weave has been running this in production for a month. Forty percent cost reduction is significant, but the "no noticeable differences in quality or velocity" is the real claim. If true, it means the routing overhead is negligible and the model selection is accurate enough that they're not seeing degraded outputs or failed tasks that require expensive retries.
The release strategy signals where this market is heading. Source-available under Elastic License 2.0 means you can self-host and audit it, but not turn it into a competing commercial service. They're also offering a hosted version at weaverouter.com, which suggests they see this becoming infrastructure other teams will pay for rather than maintain themselves.
The Implication
If you're writing code with AI agents and not thinking about routing yet, you will be soon. The cost gap between frontier models and capable mid-tier alternatives is wide enough that routing becomes a forcing function. Either you build it, adopt something like Weave's router, or keep paying the frontier tax on every request.
The bigger implication is what happens when routing intelligence becomes a commodity. If every agent platform ships with smart routing built in, the cost advantage of using AI for grunt work gets dramatically better. That accelerates the timeline on AI agents handling entire categories of work that currently sit in the "too expensive to automate" bucket.
Watch whether other coding agent platforms adopt this or build their own. If routing becomes table stakes, we'll see a wave of specialized routing models trained on different domains beyond code.