Donald Knuth posed a problem to Claude, and now LLMs have closed the loop by solving it with proof assistants.

The Summary

The Signal

In March 2026, Donald Knuth, the computer scientist who literally wrote the book on algorithms, published a problem he'd been exploring with Claude AI. He called it "Claude Cycles." The problem lived in that sweet spot where human intuition gets you close but rigorous proof requires grinding through possibilities that overwhelm even brilliant minds.

Now the problem is fully solved, not by humans, but by LLMs working with proof assistants. This is different from an LLM spitting out Python that happens to work. Proof assistants like Lean or Coq demand mathematical rigor. Every step must be justified. Every inference must be valid. You can't fake your way through formal verification the way you can sometimes slip past unit tests.

What makes this matter is the pairing. LLMs are good at intuition, at seeing patterns, at making educated guesses about what might work. Proof assistants are good at being absolutely certain. Together, they're becoming something neither is alone: a system that can explore mathematical space creatively and verify its findings mechanically. The LLM suggests, the proof assistant checks. When it works, you have something publishable.

The 362 comments on the original Hacker News thread show the technical community recognizing this isn't just a demo. When Knuth engages with an AI capability and the community dives in, you're watching a legitimacy threshold being crossed. We're moving from "can AI help with math?" to "here's a class of mathematical problems AI can now handle end-to-end."

The Implication

Watch the mathematics and formal methods communities over the next year. If LLMs plus proof assistants can crack problems that stump experts, we're looking at a new research paradigm. Not replacing mathematicians, but changing what problems are tractable. For anyone building AI systems that need to be provably correct, this is your stack. For anyone who thinks LLMs are just autocomplete, this is evidence you're measuring the wrong thing.


Source: Hacker News Best