Developer Open-Sources AI That Beat the Market by 146% in Two Years

Someone just open-sourced the investment research process that returned 146% over two years — and it runs on Claude with a few Python scripts.

The Summary

A Chinese developer published AI Berkshire, a Claude-based investment framework that codifies the methodologies of Buffett, Munger, Duan Yongping, and Li Lu into adversarial AI agents
The system forces AI to give actionable conclusions with price targets instead of hedge-everything analysis, using multi-agent debate to surface blind spots
Real portfolio performance: +69% in 2024, +66% YTD 2025, beating S&P 500 by 46-50 percentage points across both years

The Signal

Most people asking Claude to analyze stocks get glorified Wikipedia summaries that end with "consult a financial advisor." This project solves a different problem: forcing LLMs to commit to a position and defend it against opposing viewpoints.

The framework structures four distinct agent perspectives that genuinely conflict. When analyzing Pinduoduo, the Buffett agent scored it 4.4/5 on valuation (6.3x P/E excluding cash), while the Li Lu agent scored it 2.0/5 on long-term certainty due to management culture concerns. The Munger agent flagged that Douyin built a 4 trillion RMB GMV business in three years, questioning the moat's depth. This isn't prompt engineering theater. It's structured disagreement that surfaces what single-pass analysis misses.

The anti-hallucination mechanisms matter more than the investment philosophy. AI Berkshire includes a financial rigor tool that manually calculates market cap from share price times outstanding shares, then compares against reported figures. It caught discrepancies between Hong Kong dollar and RMB denominations in Tencent analysis. The system assigns information richness grades (A/B/C) to every analysis, explicitly marking low-confidence inferences. For Bubble Mart, it rated data availability as B-grade and tagged all derived metrics with confidence intervals.

"The system forces 'pass/fail/gray zone' outputs with specific price ranges, not 'on one hand, on the other hand' essays."

Eight automatic rejection criteria kill deals before detailed analysis:

Management integrity issues (instant veto regardless of valuation)
Business model unclear after reading 10 pages
Can't explain the company in five sentences (the "mirror test")
Competitive advantages that wouldn't survive adversarial questioning
Financial data inconsistencies that suggest accounting games

The contrarian check specifically asks: "Why are smart people shorting this?" It's designed to find consensus blind spots, not confirm your existing thesis. The framework would rather output "insufficient data, gray zone" than manufacture false certainty. That's backward from how most people use LLMs, which reward confident-sounding bullshit.

The performance numbers are from a verified Futu Securities account, not a backtest. Portfolio screenshots show real capital deployed. Two-year cumulative gains over 146%. That doesn't prove the method works long-term, but it proves something worked during a period when most AI trading systems face-planted.

The Implication

This is what separating reasoning from execution looks like in practice. The agents don't trade. They structure human decision-making by forcing you to answer hard questions you'd rather skip. If the Li Lu agent says "10-year certainty unclear," you don't get to ignore it because the Buffett agent likes the P/E ratio.

Watch for two things: companies building adversarial agent architectures for professional research (not just finance), and the collision between open-source agent frameworks and regulated decision-making. If this works for stock analysis, it works for legal due diligence, M&A evaluation, and strategic planning. The question isn't whether agents can do research. It's whether they can force humans to maintain intellectual discipline when money is on the line.

Sources

GitHub Trending Python

The Summary

The Signal

The Implication

Sources

Keep Reading