The companies building godlike AI still don't know how their current models make decisions — and they're betting your future on figuring it out later.

The Summary

The Signal

Daniel Kokotajlo spent two years inside OpenAI studying how quickly AI systems could improve and what risks could emerge. His job was forecasting the economic, political, and safety implications of more powerful models. He left in 2024 with a clear message: the people building superintelligence don't actually know how to control it yet.

The alignment problem is simple to state and brutal to solve. You need AI systems that reliably do what humans want, even after those systems become smarter than humans in most domains. The catch is that researchers don't fully understand how current advanced models make decisions internally. You can't align what you don't understand.

"It's a sort of open secret, but we don't really have a good plan for how to do this yet."

This isn't a theoretical problem for 2040. Kokotajlo points to AI agents as the potential turning point — the moment when AI systems start taking sustained action in the world without human oversight at each step. Agents don't just answer questions. They pursue goals over time, make plans, and execute them. If those goals drift even slightly from what you intended, and the agent is more capable than you at achieving goals, you have a problem you can't easily undo.

The AI race compounds the issue. Companies are sprinting toward AGI and superintelligence while the alignment work lags behind capability development. Kokotajlo's current work through the AI Futures Project focuses on what governments and companies can do now to reduce the risk of losing control. The implicit argument is that waiting until we have superintelligence to figure out alignment is waiting too long.

The Implication

If you're building with AI agents, you're working in the exact zone Kokotajlo is worried about. Every agent you ship that makes autonomous decisions is a small-scale version of the alignment problem. Test it hard. Understand its failure modes. Don't deploy it into critical systems assuming it will stay aligned with your intent just because it did during development.

For everyone else, this is your reminder that the people building the future don't have all the answers yet. They're not hiding a secret plan. The plan doesn't exist. Watch how companies approach AI safety and alignment in practice, not just in blog posts. The gap between capability and control is the story of the next decade.

Sources

Business Insider Tech | Business Insider Tech