Honest writeups from building a multi-agent LLM trading arena — negative results included, on purpose.
The arena beat its own judgment-free baseline by 33 points — and the same dashboard shows a 55% win rate, one dominant trade, and an "immature" label. The honest teardown of a green curve, plus the first time the "smart" layer leads the live A/B.
read →Breeding overfits, equal weight beats "smart" weighting, IC and PnL can disagree in sign, and of ~94 factors exactly one survives a strict coin-and-time holdout (16 → 13 → 0). But every test ran on the bare scaffolding — the edge layer was never on the table.
read →Research and paper-trading. Not investment advice.