Nvidia just spent $20 billion on Groq's inference tech, and if you're still thinking about AI as a training problem, you're already behind.

The Signal

Jensen Huang stood on stage at GTC and declared the "inflection point of inference has arrived." Coming from the man who built a $2 trillion empire on training chips, that's not hype, that's a signal. Nvidia licensed Groq's IP on Christmas Eve for $20 billion and baked it into the new Vera Rubin chip line, their first purpose-built inference silicon. The Groq 3 LPU isn't just another GPU iteration. It's Nvidia admitting that the game changed.

Training and inference are different animals. Training is batch processing on weeks of compute. Inference is real-time, per-query, latency-sensitive work. When a reasoning model runs dozens of inference cycles before showing you an answer, every millisecond compounds. The economic model flips too. Training was capex-heavy, centralized, done by labs with deep pockets. Inference is opex, distributed, happens millions of times per second across every application touching a language model.

For the past two years, inference chip startups have been running a Cambrian experiment. D-matrix with in-memory compute, Etched building transformer-specific ASICs, RainAI going neuromorphic. Nvidia watched, waited, then wrote a check for the approach that clicked. Groq's architecture won because it solves the latency problem without exotic compute paradigms. Nvidia doesn't buy technology often. When they do, it's because they see the next decade clearly.

The Implication

If you're building on or investing in AI infrastructure, the money is moving from training clusters to inference at the edge. Watch who's optimizing for sub-100ms response times and who's still bragging about parameter counts. The companies that win Web4 will be the ones whose agents can think fast enough to work in real human time.


Source: IEEE Spectrum AI