Karpathy-Style LLM Wiki That Agents Write and Git Tracks Themselves

The Wikipedia your agents write, hosted on your laptop, versioned in git—this is what institutional memory looks like when you build it for machines first.

The Summary

A new open-source wiki layer called Wuphf uses markdown + git as the source of truth for AI agents, with BM25 search and SQLite metadata—no vector database yet.
Each agent gets a private notebook; shared knowledge lives in a team wiki with draft-to-wiki promotion flows and append-only fact logs per entity.
It hits 85% recall@20 on BM25 alone across 500 artifacts, treating knowledge durability as a design constraint, not an afterthought.
The entire knowledge base lives in ~/.wuphf/wiki/ and can be cloned out—you own the substrate, not the SaaS.

The Signal

The builder calls this "Karpathy-style" because Andrej Karpathy has been circling the same design space for months: an LLM-native knowledge substrate that agents both read from and write into. The implementation here treats markdown and git as first-class primitives, not legacy formats you escape from. Most agent memory stacks today land on Postgres, pgvector, Neo4j, sometimes Kafka, always a dashboard. This goes the other direction. Markdown for durability. Git for provenance. BM25 for retrieval. SQLite for structured metadata.

The architecture is lean. Each agent gets a private notebook at agents/{slug}/notebook/.md. Shared knowledge lives in team/. A draft-to-wiki promotion flow means agents or humans review notebook entries before they land in the canonical wiki, with backlinks preserved. Per-entity fact logs are append-only JSONL files. A synthesis worker rebuilds the entity brief every N facts. Commits land under a distinct git identity—"Pam the Archivist"—so you can see who wrote what in git log.

"The wiki outlives the runtime, and a user can walk away with every byte."

Here's what makes this more than a Notion clone for robots:

Wikilinks are first-class. [[Broken links]] render in red. A daily lint cron checks for contradictions, stale entries, and broken wikilinks.
The /lookup slash command plus an MCP tool handle cited retrieval. A heuristic classifier routes short lookups to BM25 and narrative queries to a cited-answer loop.
The current benchmark—500 artifacts, 50 queries—clears 85% recall@20 on BM25 alone. That's the internal ship gate. If a query class drops below that, sqlite-vec is the pre-committed fallback.

The substrate choices matter. No vector database yet because the builder wanted to see how far markdown + git could go before adding weight. The answer, at least at 500 artifacts, is pretty far. BM25 is a keyword-based ranking algorithm from the 1990s. It still works. It's fast. It doesn't need embeddings or a second database to cache vector representations of every sentence. When you ship a new version of the wiki, you ship markdown files and a git history. Not a Docker Compose file with six services.

The provenance layer is where this gets interesting for teams. Every edit lands in git with a distinct author identity. Agents don't commit as "root" or "system." They commit as Pam. A human can git blame any line of the wiki and see whether an agent wrote it, when, and in response to what prompt. That's not just auditability. That's a forcing function for agent behavior. If an agent writes garbage, the garbage has a commit hash and a timestamp. You can diff it. You can revert it. You can see what facts led to the synthesis.

The Implication

If you're building agents that need to remember things across sessions, this is a reference implementation worth cloning. The design constraints are clear: durability over convenience, provenance over speed, portability over lock-in. You can git clone the entire knowledge base and walk away. That's rare in 2026.

The 85% recall number is the important one. If BM25 can hit that threshold at 500 artifacts, the incremental value of embedding models and vector search is marginal until you hit a much larger scale or a very specific query distribution. Most teams reach for vector databases on day one because that's what the tutorials say. This repo says: maybe try markdown and grep first.

Watch the fact log architecture. Append-only JSONL per entity, with a synthesis worker that rebuilds the brief every N facts. That's a write pattern that composes well with LLMs. Agents can append facts without worrying about schema or merge conflicts. The synthesis step is where the LLM does the work humans used to do: read 50 facts, write a coherent summary, link to sources.

Sources

Hacker News Best

The Summary

The Signal

The Implication

Sources

Keep Reading