Companies Ditch $200M AI Models After Discovering Smaller Ones Work Better

The pendulum just swung from "use AI for everything" to "use the right AI for the right thing" — and your budget is forcing the conversation.

The Summary

Companies are shifting from "tokenmaxxing" (max AI usage) to "modelmaxxing" (strategic model routing) after seeing their AI bills spike in early 2026
CTOs now assign specific models to specific tasks: frontier models like GPT-5.5 for hard problems, older cheaper models for repetitive work
Coinbase CEO Brian Armstrong predicts 80% of workloads will run on 99% cheaper models within 12-18 months, with only 20% needing cutting-edge intelligence

The Signal

The AI gold rush just hit its first real constraint: the budget meeting. After six months of encouraging employees to use AI for everything, companies from Uber to Microsoft are now telling their teams to be more selective. The shift isn't about AI skepticism. It's about AI literacy.

Morgan Linton, CTO of AI startup Bold Metrics, runs engineering standups twice a week where he tells his 16 engineers exactly which models to use for which tasks. One team gets Claude Fable on low settings. Another gets GPT-5.5 on high. A third team using Cursor with Composer 2.5 is getting "totally perfect results." The specificity is the point. By routing tasks to the right model, Linton doesn't need hard token caps. His team uses the best tools, just more efficiently.

"My team is getting to use the best stuff, but they're using it a lot more efficiently."

This is modelmaxxing: match the model to the task, not the task to whatever model is newest. The practice assumes what was heresy six months ago: that not every prompt needs frontier intelligence. Summarizing meeting notes doesn't require the same cognitive horsepower as architecting a distributed system. Older, cheaper models handle the former just fine.

The economic logic is brutal:

Frontier models cost 100x what older models cost per token
Most AI workloads are repetitive, not creative
Companies that don't route strategically will hit budget caps and shut down AI access entirely

Brian Armstrong made the math explicit in a June 7 post: 80% of workloads will run on 99% cheaper models within 12-18 months. The other 20%, where "IQ maxxing is important," will stay on the latest models. This isn't about doing less with AI. It's about doing more with the right AI.

The infrastructure is already here. Model routers, prompt classifiers, and dynamic switching tools are proliferating. Engineers are building internal dashboards that track which models perform best on which task types. The companies that figure out this routing layer first will have a massive cost advantage over competitors still burning frontier tokens on boilerplate code.

The Implication

If your company is still in tokenmaxxing mode, you're about to hit a wall. Start cataloging which tasks actually need frontier models and which don't. Build a routing strategy before your CFO builds a hard cap. The teams that master model switching now will have more AI budget, more flexibility, and better results than the teams still treating every model like it's one-size-fits-all.

The real prize isn't using AI more. It's using it smarter. Modelmaxxing is how you get there.

Sources

Business Insider Tech

The Summary

The Signal

The Implication

Sources

Keep Reading