Someone just open-sourced the operating system for research agents that work while you're offline.
The Summary
- ARIS (Auto-Research-In-Sleep) is a markdown-based framework that lets AI agents conduct autonomous ML research — finding papers, generating ideas, running experiments, and cross-checking their own work with reviewer loops
- No framework lock-in: works with Claude, GPT-4, or any LLM agent you point at it, including local models via LM Studio or Ollama
- Ships with 62+ reusable "skills" (markdown templates), a persistent research wiki that maintains paper/idea/experiment relationships, and self-evolution capabilities where the system analyzes its own logs and proposes improvements
The Signal
This isn't another wrapper around OpenAI's API. ARIS is a coordination layer for autonomous research workflows, built entirely on markdown files that any LLM can interpret. The architecture is surprisingly elegant: instead of hard-coding agent behaviors in Python classes, the project defines 62+ research "skills" as structured markdown templates. An agent reads the template, executes the task, writes results back to markdown. No vendor lock-in. No framework tax.
The core loop is what makes this interesting for the agent economy. ARIS implements cross-model review cycles — one model generates research ideas or experimental designs, a second model (potentially from a different provider) reviews the work for logical gaps or methodological problems. This is closer to how actual research teams operate than the single-agent-does-everything pattern most people default to. The system maintains a "Research Wiki" — a persistent knowledge graph of papers, claims, experiments, and their relationships — that survives across sessions.
"Auto-compaction corruption fix. Compaction summary preserved on OpenAI-compat executors."
The changelog tells the real story. Version 0.4.4 dropped three weeks ago with fixes for third-party Anthropic-compatible proxies, provider-aware routing, and state management across model switches. Version 0.3.5 added self-evolution: the agent analyzes its own execution logs and proposes patches to its skill definitions. This is meta-learning at the infrastructure level. The system gets better at research by doing research on itself.
What's genuinely novel here:
- Markdown as the coordination protocol means human researchers can read, edit, and audit the entire workflow
- Support for local models (LM Studio, Ollama) means you can run this without API costs or rate limits
- The "plan mode" and cooperative interrupt handling suggests someone actually used this for multi-hour research runs
- Cross-provider reviewer routing solves the "how do I use Claude and GPT-4 in the same pipeline" problem everyone hits
The project supports Windows experimentally, works with Cursor and other LLM-native editors, and maintains agent-specific documentation (AGENT_GUIDE.md) formatted for machine consumption. That last detail matters: they're designing for a world where other agents discover and learn to use ARIS autonomously.
The Implication
If you're building agent workflows for research, analysis, or any task that needs review loops and persistent memory, study this architecture. The markdown-as-protocol approach means you're not betting on any single LLM provider or agent framework. You're building on a coordination layer that will outlast whatever model is hot this quarter.
For teams already running Claude Code or similar tools: these skills are drop-in templates. You don't need to adopt the full CLI. Fork the repo, grab the markdown files that match your workflow, adapt them. The real value isn't the code — it's the research coordination patterns encoded in those 62 skills.
The self-evolution piece is the long-term signal. Agents that can analyze their own performance logs and propose infrastructure improvements are agents that compound in capability over time. That's different from just getting better prompts. That's actual learning at the system level.