The all-you-can-eat AI buffet is closing, and whoever can keep the plates full longest might own the market.
The Summary
- AI companies are switching from generous token allowances to strict rationing as compute costs, chip shortages, and infrastructure bottlenecks bite into margins
- Meta employees burned through 60 trillion tokens in one month before the company killed its internal "Claudenomics" productivity leaderboard
- Companies face a brutal choice: subsidize inference to keep users hooked, or save compute for training the next model that keeps you competitive
- OpenAI just moved Codex to token-based pricing, the first domino in what looks like industry-wide rationing
The Signal
For three years, AI companies played a game of chicken with physics. They pretended compute was infinite, tokens were free, and users could prompt until their fingers cramped. Meta's 60 trillion tokens in a single month, the equivalent of 10,000 libraries, shows what happens when you give people unlimited access: they use it. All of it.
Now the bill is coming due. The global chip shortage isn't easing. The helium supply crunch (critical for GPU cooling systems) got worse with Middle East instability. Data centers take 18 months minimum to build. Meanwhile, every inference request costs real money in electricity and hardware depreciation. The subsidies that made ChatGPT feel free were never sustainable. They were customer acquisition costs dressed up as product strategy.
The strategic bind is elegant in its brutality. Train a new model and you might leapfrog competitors technically, but you need thousands of GPUs running hot for months. Serve existing customers and you keep revenue flowing, but you burn those same GPUs on inference while rivals train the model that makes yours obsolete. OpenAI's shift to token-based pricing on Codex signals they've chosen: ration now, survive later.
This isn't about margins. It's about who controls the compute to keep training while competitors get stuck serving yesterday's models to users who expect unlimited access. The company that can subsidize longest, either through cash reserves or more efficient infrastructure, gets to keep both training new models and serving users at scale. Everyone else gets squeezed until they're running rental businesses on deprecated technology.
The Implication
Watch who stops offering generous free tiers first. That's who's hurting. If you're building agents or products on top of these models, budget for token costs to double or triple in the next 12 months. The free training era is over. Companies that got users hooked on unlimited access now have to teach them to ration, and some of those users will churn to whoever still has the subsidy budget. The endgame is concentration: two or three companies with enough compute to do both training and inference at scale, and everyone else becoming a reseller.
Source: Fast Company Tech