Developer Runs Production AI Agent on $7/Month Using 1988 Chat Protocol

Someone just built a production AI agent that runs on a server cheaper than a Netflix subscription and uses IRC, a chat protocol from 1988.

The Summary

Developer George Larson deployed two AI agents on a $7/month VPS using IRC as the transport layer, with the public-facing binary weighing just 678 KB
Tiered inference architecture: Claude Haiku 4.5 for fast conversation, Sonnet 4.6 only when tool use is needed, capped at $2/day in API costs
The system splits public (web-accessible chat) and private (email/scheduling over Tailscale) agents, both sharing a single API key through A2A passthrough
This is what lean agent infrastructure actually looks like: no Kubernetes, no microservices, just Unix principles and protocols that were already solved

The Signal

While everyone else is spinning up agent frameworks that need 16 GB of RAM and a PhD to deploy, Larson built something you can run on hardware weaker than a Raspberry Pi. The public agent (nullclaw) uses 1 MB of memory. The entire binary is smaller than a single high-res photo. It handles real conversations with real people through a web interface, and you can connect to it with any IRC client from the last three decades.

The architecture is smarter than it looks. IRC isn't just retro aesthetics. It's a solved problem for multi-client synchronous messaging with decades of battle-tested servers and clients. No WebSocket gymnastics. No polling. No inventing a new protocol when an old one works fine. The gateway pattern means one agent handles public traffic while another (ironclaw) sits behind Tailscale doing private work like email and calendar access. They share inference through Google's Agent-to-Agent (A2A) protocol, which means one API relationship, one billing line, regardless of which agent triggered the request.

The tiered inference is where this gets economically interesting. Most agent interactions don't need the full horsepower of frontier models. Haiku 4.5 is fast and cheap for conversational turns. Sonnet 4.6 only spins up when the agent needs to actually use tools. Hard cap at $2/day. That's $60/month maximum for inference, on top of $7 for hosting. Compare that to the typical agent deployment burning hundreds in serverless functions and managed services before it even talks to an LLM.

This isn't a toy. It's live, it's public, and it's handling real traffic at 298 points on Hacker News with 85 comments. The system is written in Zig, which compiles to tight native code without a runtime. No garbage collection pauses. No container overhead. Just a binary that boots fast and stays small.

The Implication

If you're building agents, stop cargo-culting the enterprise playbook. You don't need orchestration layers and service meshes to ship something useful. The constraint here, $7/month hosting plus $2/day inference, forced real architectural thinking. That's the template: separate concerns (public gateway vs. private execution), use cheap models by default and expensive ones only when justified, pick boring reliable protocols over shiny new ones.

Watch for more builders taking this route. The agent economy doesn't need billion-dollar infrastructure. It needs clever people who remember that Linux, IRC, and careful model selection can do more than most SaaS platforms charging 100x as much.

Source: Hacker News Best