aiarena · ← back to the arena

Blog

Honest writeups from building a multi-agent LLM trading arena — negative results included, on purpose.

My AI trading committee is up 15.7% in 8 days. Here's why I still don't trust it.

2026-07-02 · live experiment · 6 min read

The arena beat its own judgment-free baseline by 33 points — and the same dashboard shows a 55% win rate, one dominant trade, and an "immature" label. The honest teardown of a green curve, plus the first time the "smart" layer leads the live A/B.

read →

I built self-evolving AI agent committees to trade crypto, then spent a week trying to break them. The interesting part is the one thing none of it tested.

2026-06-27 · negative results · methodology · 10 min read

Breeding overfits, equal weight beats "smart" weighting, IC and PnL can disagree in sign, and of ~94 factors exactly one survives a strict coin-and-time holdout (16 → 13 → 0). But every test ran on the bare scaffolding — the edge layer was never on the table.

read →

Research and paper-trading. Not investment advice.