Skip to content

See MAC in Action

Three cherry-picked runs that show what MAC actually does: rules accumulate epoch-by-epoch, the constitution grows in real time, and holdout scores climb. Every recording shows the full training loop from start to finish.

Hit play to start. Use the progress bar to scrub, and the speed control to fast-forward through long epochs.


GSM8K: Math Reasoning (Auto-Adapt, Qwen3-8B)

+17% gain -- 5 rules learned across 10 epochs -- 0.750 to 0.875

MAC starts with zero rules and a blank constitution. Watch the adaptation phase spin up the four-agent network, then follow the constitution growing from v1 to v5 over 10 training epochs. By the end, five precise math rules cover rounding, discount arithmetic, and multi-step accumulation.

GSM8K / Auto-Adapt / Qwen3-8B / +17% / 5 rules learned
 

HotpotQA: Multi-Hop QA (Auto-Adapt, gpt-5.2)

+100% gain -- doubled the baseline score -- 0.125 to 0.250

One compact epoch, two accepted rules, and the score doubles. MAC identifies that the model is short-circuiting multi-hop reasoning and proposes a single focused rule: verify each intermediate entity before answering. Clean, human-readable, immediately effective.

HotpotQA / Auto-Adapt / gpt-5.2 / +100% / rules doubled score
 

GSM8K: Math Reasoning (Custom Prompt, Qwen3-8B)

+25% gain -- 5 rules -- 0.750 to 0.938 (near-perfect)

You supply the prompt structure with a {{CONSTITUTION_BLOCK}} tag; MAC fills it. Watch the constitution grow from empty to five surgical rules across 5 epochs: fraction arithmetic, simple-interest debt, sanity-check passes, discount totals. Final score: 0.938.

GSM8K / Custom Prompt / Qwen3-8B / +25% / 5 rules / 0.938