See MAC in Action
Three cherry-picked runs that show what MAC actually does: rules accumulate epoch-by-epoch, the constitution grows in real time, and holdout scores climb. Every recording shows the full training loop from start to finish.
Hit play to start. Use the progress bar to scrub, and the speed control to fast-forward through long epochs.
GSM8K: Math Reasoning (Auto-Adapt, Qwen3-8B)
+17% gain -- 5 rules learned across 10 epochs -- 0.750 to 0.875
MAC starts with zero rules and a blank constitution. Watch the adaptation phase spin up the four-agent network, then follow the constitution growing from v1 to v5 over 10 training epochs. By the end, five precise math rules cover rounding, discount arithmetic, and multi-step accumulation.
HotpotQA: Multi-Hop QA (Auto-Adapt, gpt-5.2)
+100% gain -- doubled the baseline score -- 0.125 to 0.250
One compact epoch, two accepted rules, and the score doubles. MAC identifies that the model is short-circuiting multi-hop reasoning and proposes a single focused rule: verify each intermediate entity before answering. Clean, human-readable, immediately effective.
GSM8K: Math Reasoning (Custom Prompt, Qwen3-8B)
+25% gain -- 5 rules -- 0.750 to 0.938 (near-perfect)
You supply the prompt structure with a {{CONSTITUTION_BLOCK}} tag; MAC fills it. Watch the constitution grow from empty to five surgical rules across 5 epochs: fraction arithmetic, simple-interest debt, sanity-check passes, discount totals. Final score: 0.938.