Skip to content

Model Configuration

MAC uses a three-tier model setup so each job runs on the right model: a cheap worker for annotation, strong MAC agents for learning, and an optional adaptation model for prompt rewriting.


Three-Tier Setup

compiler = MAC(
    # Tier 1: Worker (annotation) -- run cheaply or locally
    model="Qwen/Qwen3-8B",
    provider="openai",
    base_url="http://localhost:8000/v1",  # vLLM, Ollama, or any OpenAI-compatible server

    # Tier 2: MAC agents (decision / proposer / editor) -- use a strong model
    mac_model="gpt-4o",
    # mac_base_url defaults to None (cloud), NOT inherited from base_url

    # Tier 3: Prompt adaptation -- defaults to mac_model if not set
    # adapt_model="gpt-4o",

    task_description="Solve AIME competition math problems.",
    rule_type="math reasoning rules",
)

Fallback Cascade

Each tier falls back to the one above when a parameter is unset:

adapt_model       → mac_model       → model
adapt_provider    → mac_provider    → provider
adapt_base_url    → mac_base_url    → (no fallback to base_url)
adapt_temperature → mac_temperature → temperature

mac_base_url does not fall back to base_url

If your worker runs on a local vLLM server (base_url), MAC agents will not inherit that URL. They default to the standard cloud API. This prevents accidentally routing GPT-class calls to your local server.


vLLM Port Configuration

Point base_url at any OpenAI-compatible endpoint. Common setups:

# vLLM default port (8000)
MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8000/v1")

# Custom vLLM port
MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8080/v1")

# Remote vLLM server
MAC(model="Qwen/Qwen3-8B", base_url="http://192.168.1.100:8000/v1")

# Ollama
MAC(model="qwen3:8b", base_url="http://localhost:11434/v1")

# LM Studio
MAC(model="qwen3-8b", base_url="http://localhost:1234/v1")

To serve a model with vLLM:

vllm serve Qwen/Qwen3-8B --port 8000
# then in Python:
# MAC(model="Qwen/Qwen3-8B", base_url="http://localhost:8000/v1")

Provider Examples

Local vLLM worker + cloud MAC agents:

MAC(
    model="Qwen/Qwen3-8B",
    base_url="http://localhost:8000/v1",
    mac_model="gpt-4o",
)

Fully local (both worker and MAC on vLLM):

MAC(
    model="Qwen/Qwen3-8B",
    base_url="http://localhost:8000/v1",
    mac_model="Qwen/Qwen3-32B",
    mac_base_url="http://localhost:8001/v1",  # second vLLM server
)

OpenAI only:

MAC(model="gpt-4o-mini", mac_model="gpt-4o")
# uses OPENAI_API_KEY from environment

OpenRouter:

MAC(
    model="meta-llama/llama-3-8b-instruct",
    provider="openrouter",
    mac_model="anthropic/claude-3.5-sonnet",
    mac_provider="openrouter",
)
# uses OPENROUTER_API_KEY from environment

Cerebras:

MAC(model="llama3.1-8b", provider="cerebras", mac_model="gpt-4o")
# uses CEREBRAS_API_KEY from environment


Temperature and Sampling

MAC(
    model="Qwen/Qwen3-8B",
    base_url="http://localhost:8000/v1",
    temperature=0.0,          # worker temperature (deterministic)
    mac_model="gpt-4o",
    mac_temperature=0.7,      # MAC agent temperature (creative rule proposals)
)

Lower worker temperature gives more consistent annotations. Higher MAC agent temperature encourages diverse rule proposals.


Environment Variables

Variable Used by
OPENAI_API_KEY OpenAI provider (default)
OPENROUTER_API_KEY OpenRouter provider
CEREBRAS_API_KEY Cerebras provider
ANTHROPIC_API_KEY Anthropic provider

Set these in .env or export them in your shell. See .env.example in the repo for a template.