The guy who built Keras just showed the world that AI can't think on its feet, and that might be the whole game.
The Summary
- François Chollet launched ARC-AGI-3, a benchmark where humans excel but AI systems fail: novel video games with no instructions
- Current models are "reliant on memorization and retrieval" and collapse when facing truly novel problems
- The test reveals a hard limit: AI has more knowledge than humans but can't recombine it to handle the unexpected
The Signal
Chollet created the Keras framework that millions use to build AI models. Now he's using his new benchmark to expose their fundamental weakness. ARC-AGI-3 presents simple games that humans solve almost instantly. Most AI systems can't touch them. Not because they lack information. The opposite. They have more patterns, more data, more encoded abstractions than any human brain. They fail because they can't improvise.
"A human is generally intelligent. A human is never lost. A human figures it out on the fly because they have fluid intelligence," Chollet says. Current models have crystallized intelligence at massive scale. They pattern-match against their training data with superhuman speed. But show them something genuinely new, and the machinery stalls. They can't recombine what they know to navigate the unknown.
This matters more than you think. Every conversation about AI agents assumes they can handle edge cases, unexpected situations, the long tail of reality. Chollet is saying they fundamentally can't. Not yet. Maybe not with this architecture. The models absorb patterns better than human brains. But at test time, when the situation is novel, they have "very low ability to recombine that knowledge." It's not a bug. It's how the paradigm works.
The companies racing to deploy agents are building on a foundation that breaks under novelty. That's not a problem when you're summarizing text or generating code from patterns. It's a showstopper when you need an agent to actually navigate the world.
The Implication
If you're building with AI agents, stress-test them on truly novel scenarios. Not variants of training data. Not slightly different problems. Edge cases that require on-the-fly reasoning. The gap between "knows a lot" and "can figure it out" is wider than the demo videos suggest. Watch for architectures that solve this. Whoever cracks fluid intelligence in machines builds the actual agent economy, not the current version that only works in controlled environments.
Source: Fast Company Tech