How to run Qwen 3.5 locally

Qwen 3.5 is now trivial to run locally, and that's the sound of another wall coming down between enterprise AI and everyone else.

The Signal

Alibaba's Qwen 3.5 models are now documented for local deployment via Unsloth, a toolkit that's become the de facto standard for running open-weight models without burning through cloud credits. This matters because Qwen 3.5 competes directly with GPT-4 class models on benchmarks, but you can run the 72B parameter version on consumer hardware with proper quantization. The economics are stark: a 32k token context window that costs $3 per million tokens via API now costs you the electricity to spin up your GPU.

The technical barrier to local AI deployment has collapsed in the past six months. Unsloth handles the quantization and memory optimization that used to require deep ML expertise. GGUF format support means you can run these models on everything from M-series Macs to gaming rigs with 24GB VRAM. The 228 points on Hacker News isn't hype, it's engineers recognizing that the moat around frontier models is drying up faster than anyone predicted.

What's buried in the comments: developers are already fine-tuning Qwen 3.5 for specialized tasks. Legal document analysis. Code review. Customer support routing. Tasks that would have required six-figure API contracts last year. The Chinese AI labs have figured out something Western companies are still processing: giving away the weights and winning on tooling and services beats trying to own the model layer.

The Implication

If you're building AI products on API calls alone, you're about to get undercut by someone running inference in-house for pennies. The strategic play is understanding which workloads stay cloud (latency-sensitive, bursty) and which migrate local (predictable, privacy-critical, high-volume). Start experimenting with local deployment now, even if you stay on APIs short-term. The next pricing negotiation with your AI provider gets a lot more interesting when you can credibly walk away.

Source: Hacker News Best