await agent(run), where the run is the live handle for one task: its prompt, its connection to the
environment, and the trace it fills.
Because an environment only exposes capabilities, the agent isn’t baked in - use a built-in agent for a
standard model, or bring your own harness for a custom
loop.
Built-in agents
The SDK ships one agent per major provider, reached two ways:create_agent(model)- the preferred path. It selects the matching provider agent for a model id and routes every call through the HUD gateway.- a provider agent directly (e.g.
ClaudeAgent(ClaudeConfig(...))) - the same class constructed yourself, for full config control or to call the provider with your own key instead of the gateway.
inference.hud.ai) that fronts every provider behind
your single HUD_API_KEY, so you switch between Claude, GPT, Gemini, or Grok by name alone, with
unified tracing. create_agent accepts any id the gateway knows (claude-..., gpt-..., gemini-...,
grok-...); extra kwargs pass through to the agent’s config.
The reason this is one line: built-in agents are catalog-driven. Each run they read the
environment’s manifest, open the capabilities they support, build the matching provider tools, and loop
against run.prompt_messages. Declaring a capability on the environment is enough; you never wire
tools.
Provider agents
Each model maps to a provider agent - the class that speaks that provider’s API. Construct one directly to set its full config or use your own provider key:| Agent | Config | Default model |
|---|---|---|
ClaudeAgent | ClaudeConfig | claude-sonnet-4-6 |
OpenAIAgent | OpenAIConfig | gpt-5.5 |
GeminiAgent | GeminiConfig | gemini-3-pro-preview |
OpenAIChatAgent | OpenAIChatConfig | gpt-5.4-mini |
ClaudeSDKAgent | ClaudeSDKConfig | claude-sonnet-4-6 |
hud.agents.types. OpenAIChatAgent speaks the OpenAI Chat Completions API, so it
points at any compatible server (vLLM, a local model) via base_url; ClaudeSDKAgent runs the claude
CLI over an ssh capability, against the env’s filesystem. Every knob (model, max_steps,
system_prompt, citations_enabled) lives on the config; __call__(run) takes only the run.
Running an agent
Run a task with an agent two ways. Programmatically - pass the agent totask.run / taskset.run with a runtime:
hud eval takes a task source (.py, a directory, or
.json/.jsonl) and an agent name (claude, openai, gemini, openai_compatible), runs each
rollout in a fresh env subprocess, grades it, and prints the reward:
| Flag | Effect |
|---|---|
--model, -m | Pin a specific model id. |
--group N | Run each task N times, to see the reward spread. |
--max-steps N | Cap agent steps per task. |
--all / --full | Run the whole source (--full also auto-responds, 100 steps). |
--gateway | Force calls through the gateway even when a provider key is set. |
HUD_API_KEY set, calls route through the gateway; with a provider key present they go
straight to the provider. See the CLI reference for the full flag set and key
resolution.
Bring your own harness
Any loop or framework can be an agent: subclassAgent, drive the environment off the run, and write
the final answer to run.trace.content (what gets graded). Since this is outside the standard workflow,
the seam, the Run object you work with, the step types you record, and worked examples live in
Extending HUD.