Anthropic Secretly Cut Cache Lifespan, Torching Pro Max Credits 85% Faster

Anthropic's Pro Max tier is burning through its 5x usage quota in 90 minutes, and developers are blaming a quiet cache TTL downgrade nobody told them about.

The Summary

Developers report Pro Max quota exhaustion in 1.5 hours with what they describe as moderate coding work, not heavy batch processing
Anthropic silently reduced prompt caching TTL on March 6th, forcing more full-context reloads and multiplying token consumption
The cache change means the same workflow now costs 3-5x more tokens, effectively canceling out the Pro Max tier's promised 5x capacity increase

The Signal

Anthropic's $200/month Pro Max tier launched with a simple pitch: five times the usage of the $20 Pro plan. But developers using Claude for coding tasks are hitting their monthly cap in under two hours, and the math doesn't add up to user behavior alone.

The real culprit appears to be infrastructure economics dressed as a feature tweak. On March 6th, Anthropic cut the TTL (time-to-live) for prompt caching with no announcement, no changelog update, and no developer notification. Prompt caching lets AI coding tools reuse context, like your codebase structure or documentation, across multiple requests. Longer cache windows mean you pay once to load context, then pay reduced rates for subsequent queries that reference it.

"The same workflow that used to cost 50k tokens now costs 250k because the cache expires mid-session."

When cache TTL drops, every conversation becomes more expensive. You're re-uploading the same context multiple times in what used to be a single session. For a developer doing iterative code reviews or debugging, this compounds fast. One user calculated their typical morning's work, previously 120k tokens with healthy caching, now runs 400k+ tokens because context expires between queries.

Key economics:

Old cache TTL: context persisted through 90+ minute sessions
New cache TTL: estimated 5-15 minutes based on observed expiry patterns
Result: 3-5x token multiplication on identical workflows

This isn't a bug. It's a pricing adjustment disguised as a technical parameter change. Anthropic is facing the same unit economics squeeze as every frontier model provider. Training costs are measured in hundreds of millions. Inference costs drop slower than usage grows. The gap gets filled by optimizing cache behavior, which is a polite way of saying "make the customer pay to re-send the same data."

The developer community's anger isn't really about the cache change. GitHub issues show 280+ upvotes and 200+ comments because Anthropic didn't communicate the change. Power users who upgraded to Pro Max specifically for coding workflows now have a tier that costs 10x more but delivers roughly the same effective capacity as the old Pro plan after cache economics are factored in.

The Implication

If you're building on Claude API or evaluating AI coding tools, start measuring token consumption per feature, not per session. Cache behavior is now a pricing variable that providers will tune without warning. The agent economy runs on context, and context costs aren't fixed anymore.

Watch how Anthropic responds. If they restore cache TTL, it signals they mispriced Pro Max and need user trust more than margin. If they hold the line, expect OpenAI and Google to test similar "optimizations." The AI service layer is discovering SaaS pricing fundamentals: once customers are locked in, you can renegotiate value capture.

Sources

Hacker News Best

The Summary

The Signal

The Implication

Sources

Keep Reading