Model Configuration
MAC uses a three-tier model setup so each job runs on the right model: a cheap worker for annotation, strong MAC agents for learning, and an optional adaptation model for prompt rewriting.
Three-Tier Setup
compiler = MAC(
# Tier 1: Worker (annotation) -- run cheaply or locally
model="Qwen/Qwen3-8B",
provider="openai",
base_url="http://localhost:8000/v1", # vLLM, Ollama, or any OpenAI-compatible server
# Tier 2: MAC agents (decision / proposer / editor) -- use a strong model
mac_model="gpt-4o",
# mac_base_url defaults to None (cloud), NOT inherited from base_url
# Tier 3: Prompt adaptation -- defaults to mac_model if not set
# adapt_model="gpt-4o",
task_description="Solve AIME competition math problems.",
rule_type="math reasoning rules",
)
Fallback Cascade
Each tier falls back to the one above when a parameter is unset:
adapt_model → mac_model → model
adapt_provider → mac_provider → provider
adapt_base_url → mac_base_url → (no fallback to base_url)
adapt_temperature → mac_temperature → temperature
mac_base_url does not fall back to base_url
If your worker runs on a local vLLM server (base_url), MAC agents will not inherit that URL. They default to the standard cloud API. This prevents accidentally routing GPT-class calls to your local server.
vLLM Port Configuration
Point base_url at any OpenAI-compatible endpoint. Common setups:
# vLLM default port (8000)
MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8000/v1")
# Custom vLLM port
MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8080/v1")
# Remote vLLM server
MAC(model="Qwen/Qwen3-8B", base_url="http://192.168.1.100:8000/v1")
# Ollama
MAC(model="qwen3:8b", base_url="http://localhost:11434/v1")
# LM Studio
MAC(model="qwen3-8b", base_url="http://localhost:1234/v1")
To serve a model with vLLM:
vllm serve Qwen/Qwen3-8B --port 8000
# then in Python:
# MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8000/v1")
Provider Examples
Local vLLM worker + cloud MAC agents:
Fully local (both worker and MAC on vLLM):
MAC(
model="Qwen/Qwen3-8B",
base_url="http://localhost:8000/v1",
mac_model="Qwen/Qwen3-32B",
mac_base_url="http://localhost:8001/v1", # second vLLM server
)
OpenAI only:
OpenRouter:
MAC(
model="meta-llama/llama-3-8b-instruct",
provider="openrouter",
mac_model="anthropic/claude-3.5-sonnet",
mac_provider="openrouter",
)
# uses OPENROUTER_API_KEY from environment
Cerebras:
MAC(model="llama3.1-8b", provider="cerebras", mac_model="gpt-4o")
# uses CEREBRAS_API_KEY from environment
Temperature and Sampling
MAC(
model="Qwen/Qwen3-8B",
base_url="http://localhost:8000/v1",
temperature=0.0, # worker temperature (deterministic)
mac_model="gpt-4o",
mac_temperature=0.7, # MAC agent temperature (creative rule proposals)
)
Lower worker temperature gives more consistent annotations. Higher MAC agent temperature encourages diverse rule proposals.
Environment Variables
| Variable | Used by |
|---|---|
OPENAI_API_KEY |
OpenAI provider (default) |
OPENROUTER_API_KEY |
OpenRouter provider |
CEREBRAS_API_KEY |
Cerebras provider |
ANTHROPIC_API_KEY |
Anthropic provider |
Set these in .env or export them in your shell. See .env.example in the repo for a template.