1. Install
2. Set your API key
Get a key from hud.ai/project/api-keys — one key both routes models through the HUD gateway and traces every rollout.3. Write a task
Scaffold a complete, runnable example to start from:tasks.py directly. A task is defined by a template — an async generator registered with @env.template: yield a prompt, receive the answer, yield a reward (0.0–1.0). Calling the template mints a runnable Task:
tasks.py
4. Run it
hud eval collects the tasks, spawns the environment on a local substrate, runs the claude agent, and grades it. --group 3 runs the task three times so you can see the reward variance across rollouts. It prints each reward and a trace link on hud.ai, where you can replay every step. Add --full to run every task in the dataset.
Next
Package & deploy
Build a portable image and run it anywhere.
Add capabilities
Give the agent a shell, browser, GUI, or robot to act on.
Design tasks for signal
Make tasks that actually train, not just test.
Run on any model
Claude, OpenAI, Gemini, or your own endpoint.