HUD Documentation - Evaluations and RL Environments.

Fastest path - hand the docs to your coding agent first. The HUD docs skill scaffolds correct v6 environments and flags weak task designs as you build:

npx skills add https://docs.hud.ai

The rest of this page walks the setup path by hand.

1. Install

uv tool install hud-python --python 3.12

2. Set your API key

Get a key from hud.ai/project/api-keys - one key both routes models through the HUD gateway and traces every rollout.

hud set HUD_API_KEY=your-key-here

3. Write a task

Scaffold a complete, runnable example to start from:

hud init my-env
cd my-env

This includes a ready-to-run task in tasks.py:

tasks.py

from hud import Environment

env = Environment(name="letter-count")

@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
    answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
    yield 1.0 if answer and str(word.count(letter)) in answer else 0.0

tasks = [count_letter(word=w) for w in ("strawberry", "raspberry", "blueberry")]

4. Run it

hud eval tasks.py claude

hud eval spawns the environment locally, runs the claude agent, and grades it. Every rollout generates a replayable trace on hud.ai.

Package & deploy

Build a portable image and run it anywhere.

Add capabilities

Give the agent a shell, browser, GUI, or robot to act on.

Design tasks for signal

Make tasks that actually train, not just test.

Run on any model

Claude, OpenAI, Gemini, or your own endpoint.

Introduction

Overview

⌘I

Quickstart

1. Install

2. Set your API key

3. Write a task

4. Run it

Next

Package & deploy

Add capabilities

Design tasks for signal

Run on any model

​1. Install

​2. Set your API key

​3. Write a task

​4. Run it

​Next

Package & deploy

Add capabilities

Design tasks for signal

Run on any model

1. Install

2. Set your API key

3. Write a task

4. Run it

Next