Grep Burns 50x More Tokens Than This AI Code Search Tool

The agent economy has a grep problem, and it's eating your token budget alive.

The Summary

Semble is an open-source code search tool built specifically for AI agents, cutting token usage by 98% compared to traditional grep+file read approaches while maintaining 99% of transformer-quality retrieval accuracy.
Uses static Model2Vec embeddings combined with BM25 search, runs entirely on CPU, and indexes a typical repo in ~250ms with queries completing in ~1.5ms.
Ships with an MCP server for drop-in integration with Claude Code, Cursor, and other agent platforms, requiring zero configuration, no API keys, and no GPU.

The Signal

When AI coding agents can't find what they need directly, they fall back to grep. Then they read entire files. Then they spawn subagents to search through those files. Each step burns tokens, your API budget evaporates, and half the time the agent still misses the relevant code because grep wasn't designed to understand semantic similarity.

This is the hidden cost structure of the agent economy. Every failed search compounds. Every full file read multiplies your spend. The better agents get at coding, the more expensive their inability to search efficiently becomes.

"On ~1250 query/document pairs across 63 repos and 19 languages, Semble uses 98% fewer tokens than grep+read while reaching 0.854 NDCG@10."

Semble's approach is notable for what it doesn't use: transformers, GPUs, external APIs. Instead it combines static embeddings from Model2Vec with BM25 keyword search, fused through reciprocal rank fusion and reranked with code-aware signals. The key insight is that static embeddings work for code search specifically because code has more structural regularity than natural language. Function names, variable patterns, import statements: these repeat in predictable ways across codebases.

The speed numbers matter: 250ms to index a repo, 1.5ms per query. That's fast enough to re-index on every agent session without the user noticing. No stale indexes. No "rebuild your search database" step that developers skip. It just works, every time, on whatever code the agent is touching right now.

The Model Context Protocol integration is the practical unlock. Drop it into Claude Code with one command and it becomes the agent's search layer. The agent never needs to know it's using a different tool. It just stops burning tokens on grep chains and full file reads.

Key technical wins:

99% of transformer accuracy at 1/200th the speed
Works across 19 programming languages without language-specific tuning
No external dependencies means no API rate limits, no network latency, no vendor lock-in

What makes this a signal and not just another developer tool: it's infrastructure for the agent layer. As agents become the primary interface for interacting with code, the economics of agent operations become the economics of software development. Token efficiency stops being an optimization problem and becomes a cost structure problem. Tools that make agents 50x cheaper to run don't just save money. They expand the envelope of what's economically viable to automate.

The Implication

If you're building agent tooling or running agents at scale, token efficiency is your margin. Semble is open source and runs locally, which means you can fork it, measure its impact on your agent token spend, and decide if semantic code search belongs in your agent stack. The 98% token reduction compounds over every search, every session, every agent run.

Watch for more specialized tools like this: agent-native infrastructure that optimizes for token budgets instead of human developer experience. The winning abstractions for Web4 won't be the ones that feel best to humans. They'll be the ones that let agents run cheaper and faster.

Sources

Hacker News Best

The Summary

The Signal

The Implication

Sources

Keep Reading