An AI just learned to drive desktop software without reading the manual.

The Summary

The Signal

Codex didn't use Adobe's API. It didn't plug into a browser extension or use some sanctioned integration pathway. It just worked out how to interact with Lightroom as a desktop application and executed a batch workflow that Gostev didn't even know how to do manually. That's the headline here.

This is different from the RPA tools companies have been trying to sell you for a decade. Those require mapping workflows, defining steps, handling exceptions. Codex saw the task and invented the solution in real time. Gostev pointed it at a problem, denoising 50 photos one click at a time, and the agent figured out the batch operation without training data specific to that version of Lightroom.

"Not just assisting, but autonomously navigating and operating software like a human would, only faster."

What makes this work? A few things converging:

  • Vision models that can parse UI elements the same way you do when you hunt for a button
  • Reasoning models that can infer software behavior from context, not just documentation
  • Execution layers that can translate intent into clicks, keystrokes, menu navigation

The catch is Gostev is technical. He's the AI capability lead at Arena.ai and created BullshitBench, a benchmark for cutting through AI hype. This isn't your average knowledge worker solving a problem with a chatbot. But the gap between what he can do today and what normies will be able to do tomorrow is shrinking fast.

The Implication

If agents can teach themselves software operation without official integrations, every GUI becomes accessible. Adobe doesn't need to build an API for every workflow. Microsoft doesn't need to document every Excel macro. The agent just watches, learns, executes.

The winners here are twofold. First, companies building agent orchestration layers that can reliably interface with any desktop app. Second, workers who figure out how to direct these agents before their colleagues do. The losers are software companies banking on integration moats and workers who think "knowing the software" is job security.

Watch for this pattern to move from technical early adopters into productivity tools aimed at non-technical users. When someone's mom can automate photo editing workflows she doesn't understand, we're in a different economy.

Sources

Business Insider Tech