Cognition's $26B CEO: Stop Obsessing Over Token Spend Leaderboards

The $26 billion AI coding company is telling everyone to stop measuring the wrong thing.

The Summary

Cognition CEO Scott Wu says token spend leaderboards are "directionally correct" but companies rank engineers by the wrong metric: output matters, not how much API spend you rack up
The tokenmaxxing phenomenon may have been overblown from the start, raising questions about whether enterprises ever really adopted spend-ranking at scale
Wu argues that if engineers ship 3x more with AI, the compute cost is "clearly worth it," but rewards need to tie to actual delivery

The Signal

Cognition's Scott Wu just said the quiet part out loud: ranking engineers by token spend is incentivizing the wrong behavior. The CEO of the company behind Devin, the autonomous AI software engineer that helped push Cognition to a $26 billion valuation in May 2025, told the Founders podcast that while token leaderboards have the spirit right, execution is broken. "People are like, 'We rank our engineers by how many tokens they're spending,'" Wu said. "Well, let's try and rank people by how much output they're actually producing."

The timing matters. Tech leaders have spent months critiquing tokenmaxxing as wasteful theater. Now one of the most valuable AI coding startups is confirming what many suspected: the leaderboard craze confused activity with results. Wu's not arguing against AI spend. He's arguing against measuring the wrong thing.

"If engineers can ship three times more than they would without AI, it is clearly worth it."

SemiAnalysis raises a bigger question: was widespread tokenmaxxing ever really here? Their enterprise conversations suggest the phenomenon may have been more Twitter discourse than boardroom reality. If true, that means the backlash is landing harder than the original trend ever did.

Here's what Wu gets right: compute is expensive, but output compounds. An engineer who ships 3x more features with AI assistance creates 3x more leverage for the business. The token bill is a rounding error compared to the value of velocity. But only if that velocity produces working software, not just more code.

Key tensions emerging:

Token spend as a proxy for AI adoption versus actual productivity gains
How to measure "output" when AI changes what output looks like
Whether leaderboards ever scaled beyond a handful of loud adopters

The real issue is what Wu hints at but doesn't fully unpack: nobody has figured out how to measure AI-assisted output yet. Lines of code shipped is a garbage metric. Features deployed gets closer, but ignores quality. Customer value delivered is right but hard to attribute. So companies default to what's easy to track, which is API spend. And that's how you get engineers gaming leaderboards instead of building products.

The Implication

If you're running engineering teams, Wu just gave you permission to kill the token leaderboard. Replace it with something harder but real: velocity of shipped features, reduction in bug rates, time from commit to production, customer-facing improvements per sprint. The metric matters less than the principle, which is measuring outcomes, not inputs.

For AI tooling companies, this is a warning shot. Enterprises are getting smarter about what they pay for. Selling "AI adoption" as a volume game won't work much longer. The next wave of sales conversations will center on productivity evidence, not token counts. Build the analytics that prove output gains, or watch budget holders get skeptical.

Sources

SemiAnalysis | Business Insider Tech

The Summary

The Signal

The Implication

Sources

Keep Reading