Skip to main content
From install to your first graded trace: you’ll write a task, run it against a model through the HUD gateway, and read the reward. Fastest path — hand the docs to your coding agent first. The HUD docs skill scaffolds correct v6 environments and flags weak task designs as you build:
npx skills add https://docs.hud.ai
The rest of this page walks the same path by hand.

1. Install

uv tool install hud-python --python 3.12

2. Set your API key

Get a key from hud.ai/project/api-keys — one key both routes models through the HUD gateway and traces every rollout.
hud set HUD_API_KEY=your-key-here

3. Write a task

Scaffold a complete, runnable example to start from:
hud init my-env
Or write tasks.py directly. A task is defined by a template — an async generator registered with @env.template: yield a prompt, receive the answer, yield a reward (0.01.0). Calling the template mints a runnable Task:
tasks.py
from hud import Environment

env = Environment(name="letter-count")

@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
    answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
    yield 1.0 if answer and str(word.count(letter)) in answer else 0.0

tasks = [count_letter(word=w) for w in ("strawberry", "raspberry", "blueberry")]

4. Run it

hud eval tasks.py claude --group 3
hud eval collects the tasks, spawns the environment on a local substrate, runs the claude agent, and grades it. --group 3 runs the task three times so you can see the reward variance across rollouts. It prints each reward and a trace link on hud.ai, where you can replay every step. Add --full to run every task in the dataset.

Next

Package & deploy

Build a portable image and run it anywhere.

Add capabilities

Give the agent a shell, browser, GUI, or robot to act on.

Design tasks for signal

Make tasks that actually train, not just test.

Run on any model

Claude, OpenAI, Gemini, or your own endpoint.