Skip to main content
HUD is a platform for building RL environments for AI agents. It gives you three things:
  1. Environment SDK — Create a harness with agent-callable tools. Define evaluation logic with scenarios that yield a prompt and a reward.
  2. Eval & Training Platform — Run evaluations at scale on hud.ai. Collect traces. Train models on successful runs.
  3. Model Gateway — One OpenAI-compatible endpoint at inference.hud.ai for every model: Claude, GPT, Gemini, Grok. One API key, swap the model string.
New to HUD? Read Core Concepts first — it defines the four building blocks (environments, tools, scenarios, tasks) and how they fit together.

Install

# Install CLI
uv tool install hud-python --python 3.12

# Set your API key
hud set HUD_API_KEY=your-key-here
Get your API key at hud.ai/project/api-keys.

1. Environments: Define the your agents harness

An environment wraps your code as tools agents can call, and defines scenarios that evaluate what agents do. Each environment spins up fresh and isolated for every evaluation — no shared state, fully reproducible.
from hud import Environment

env = Environment("my-env")

@env.tool()
def search(query: str) -> str:
    """Search the knowledge base."""
    return db.search(query)

@env.scenario("find-answer")
async def find_answer(question: str):
    answer = yield f"Find the answer to: {question}"
    yield 1.0 if "correct" in answer.lower() else 0.0
The scenario has two yields: the first sends a prompt to the agent and receives its answer. The second scores the result as a reward between 0.0 and 1.0. Learn more about scenarios.

The Canonical Workflow

hud init my-env       # Scaffold environment
cd my-env
hud dev env:env -w env.py   # Run MCP server locally with hot-reload on watched paths
hud eval tasks.json claude  # Run an eval locally
hud deploy                  # Deploy to platform → run at scale
More on Environments · Deploy to Platform

2. Tasks & Training: Evaluate and Train

A task is a scenario with specific arguments. Group tasks into tasksets and run them across models. Evaluate and calibrate environments and tasks to specific models. You can train a model on a taskset to produce a better model on your usecase.
import hud
from hud.agents import create_agent

task = env("find-answer", question="What is the capital of France?")
agent = create_agent("claude-sonnet-4-5")

async with hud.eval(task) as ctx:
    result = await agent.run(ctx)

print(f"Reward: {result.reward}")
Create tasks on hud.ai, run evaluations across models, and train through integrations with Tinker, OpenAI, and Coreweave. More on Tasks & Training

3. Models: Any Model, One API

Out of the box integrations to all of the major model providers. Point any OpenAI-compatible client at inference.hud.ai and use any model. Browse all available models at hud.ai/models.
from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

response = await client.chat.completions.create(
    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro, grok-4-1-fast...
    messages=[{"role": "user", "content": "Hello!"}]
)
Every call is traced. View them at hud.ai/home. More on Models

Next Steps

Core Concepts

Environments, tools, scenarios, tasks — defined in one place.

Environments

Tools, scenarios, and iteration.

Tasks & Training

Evaluate and train models.

Best Practices

Patterns for reliable environments and evals.

Community

GitHub

Star the repo and contribute

Discord

Join the community

Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training pipelines. 📅 Book a call · 📧 founders@hud.ai