QuietBox 2 Runs 70B Models at 500 Tokens Per Second for $10,000

Ten grand gets you a box that runs 70B models at 500 tokens per second on a standard wall outlet.

The Summary

Tenstorrent's QuietBox 2 packs four custom AI accelerators and 384GB total memory into a $9,999 desktop that draws 1,400 watts at full load
Runs Meta's Llama 3.1 70B several times faster than cloud-based GPT-5.2 or Claude 4.6, all on local hardware
Four Nvidia RTX 5090s would deliver similar performance but require 4,000+ watts (multiple circuits) and cost significantly more
Ships Q2 2026, targeting developers and companies tired of paying API fees or shipping data to the cloud

The Signal

The friction point isn't compute anymore. It's power and memory architecture. Tenstorrent found the gap: most PCs can barely load a 13B parameter model, high-end workstations struggle past 70B, but the models people actually want to run locally (70B+) need memory configurations that standard GPU setups can't deliver without melting your circuit breaker.

The QuietBox 2's 128GB of GDDR6 plus 256GB DDR5 gives you enough headroom to load GPT-OSS-120B and run it fast. Compare that to the Nvidia path: four RTX 5090s to match the memory, which means rewiring your office for multiple 20-amp circuits, a case the size of a dorm fridge, and a cost structure that makes the QuietBox look cheap.

This matters because local inference is shifting from hobby to requirement. Companies building agent systems don't want to send every thought through an API. Developers working on sensitive data can't. The "run it in the cloud" answer is losing ground to "run it in the room." But until now, the hardware to do that at useful scale meant either cloud bills or server racks. Tenstorrent is betting there's a market for "powerful enough, fits under your desk, doesn't require an electrician."

The real tell: they're not competing on raw speed. They're competing on the full system, power included. That's a different game than the one Nvidia's playing. It suggests a market forming around practical deployment, not benchmark wars.

The Implication

Watch who buys these. If it's just hobbyists and researchers, this is a niche play. If software companies and agent platforms start buying them by the dozen, it signals the agent economy is going local faster than the cloud giants want to admit. The constraint was never "can we build smart models?" It was "can we run them where we need them, on power we have, at cost that scales?"

The QuietBox 2 is an answer. Whether it's the right answer depends on whether 70B-120B models become the workhorses of practical AI, or if everything keeps climbing toward trillion-parameter monsters that only data centers can feed.

Source: IEEE Spectrum AI