AI got cheap and scarce at the same time, and that's not a paradox, it's a bottleneck.

The Summary

The Signal

We're in a weird moment. API costs for frontier models have plummeted, making it economically viable to route more workloads through AI. But try to actually do that at any meaningful scale and you hit rate limits, capacity walls, and "please try again later" errors. The price dropped, but the pipe didn't get wider.

This matters because the whole Web4 thesis depends on agents running continuously, making decisions, handling tasks autonomously. When those agents can't get compute when they need it, they're just expensive cron jobs. The economics say "build this," but the infrastructure says "not yet".

The companies feeling this most are the ones between the model providers and end users. They're the ones building agent platforms, workflow automation, AI-native apps. They've priced their products assuming API costs keep falling and availability keeps rising. One of those assumptions is holding. The other isn't. That's a margin problem becoming a reliability problem.

The Implication

If you're building anything that depends on consistent AI inference, you need a Plan B. That might be smaller models you can host, hybrid architectures, or just honest conversations with customers about what "always available" actually means right now. Watch for the companies that solve this by moving down the stack, running their own inference infrastructure. They'll have worse unit economics in the short term but better control. In a capacity-constrained market, control beats cost.


Sources: Exponential View | Exponential View