Anthropic just made Claude Opus 45% more expensive to run, and nobody authorized it.
The Summary
- Claude Opus 4.7's system prompt bloated token usage by ~45% compared to 4.6, adding invisible overhead to every API call whether developers wanted it or not
- The system prompt changes show Anthropic adding guardrails and behavioral constraints directly into the base prompt that runs before your actual request
- Developers discovered the inflation through anonymous request-token comparisons showing real usage data across model versions
- This is what model drift looks like when the vendor controls both the weights and the wrapper
The Signal
The 45% token inflation isn't a bug. It's Anthropic embedding more instructions into Claude's system prompt between versions 4.6 and 4.7. Every time you call the API, you're paying for a longer preamble you didn't write and can't see.
Simon Willison's breakdown reveals what changed: more safety guidelines, more behavioral constraints, more "helpful assistant" framing. Each addition costs tokens. Multiply that across millions of API calls and you've got a meaningful cost increase with zero notification.
"Developers discovered the inflation through anonymous request-token comparisons showing real usage data across model versions."
This matters because it exposes the control problem at the heart of the agent economy. You're building on infrastructure you don't own, where the rules can change mid-sprint. The model you tested last month isn't quite the same model you're deploying today.
Key dynamics at play:
- System prompts are invisible overhead that developers pay for but don't control
- Token costs compound across agent loops, batch jobs, and production workloads
- Model providers can shift behavior and economics without changing version numbers
The leaderboard data shows this isn't theoretical. Real requests, real inflation, real money. If you're running agents that make hundreds of Claude calls per task, you just got a 45% price hike disguised as a model improvement.
Web4 is supposed to be about agents that build while you sleep. But if the foundation models can silently rewrite their own economics, you're not building on bedrock. You're building on sand that shifts with every deployment.
The Implication
If you're shipping agent products on Anthropic, audit your token usage now. Check what 4.6 versus 4.7 is costing you in production. If the delta is eating your margins, consider whether you need to version-lock, switch models, or build more aggressive caching.
Longer term, this is an argument for open weights and local inference. The companies winning in Web4 won't be the ones with the cleverest prompts. They'll be the ones who control their compute costs and can predict them quarter to quarter. Vendor models are powerful, but they're also variable rate mortgages. Sometimes the rate adjusts and you didn't read the fine print.