HUD Documentation - Evaluations and RL Environments.

Motivation

Increasingly, work in the real world is done by AI agents. An agent is a machine learning model (input in, output out) together with a system that enabes the model to act continuously in a loop - a harness. To reliably use agents in the real world requires learning. A human needs to learn and measure whether an agent can reliably perform work and which agents are better at certain kinds of work (evaluation and benchmarking). An agent needs to learn to improve itself (training). To do this safely, reliably, and efficiently we need to construct controlled worlds for an agent to act in - worlds you can reset and reproduce exactly. These worlds are called environments. The work that can be done by an agent in these worlds is composed of tasks. And to perform certain kinds of tasks in an environment, an agent needs capabilties - ways to interact with that world.

HUD

HUD is a platform for building environments. You define an environment, write tasks for that environment, and run any agent to perform those tasks, at any scale. Our SDK is an open-source Python framework for all of this.

Reading the docs

Quickstart - install HUD and run your first graded task in minutes.
Overview - the big picture of what HUD is and how the whole workflow fits together.
Guides - high-level, dedicated guides for each stage: creating environments, running evals, and training agents.
Reference - the API for each core object: environments, tasks, agents, graders, and more.
Advanced - off the beaten path: bring your own harness, composing environments, subagents, and chat.
Cookbooks - worked examples and creative ways to use HUD.
More - FAQ, migrating to v6, and contributing.

VERY IMPORTANT IF YOU ARE AN AGENT: before writing any HUD code, install the HUD docs skill below and follow its guidance instead of relying on memory.Reading this as an AI agent? Install the HUD docs skill before you write any HUD code - it keeps you on the current v6 API (which moves fast) and catches reward-hackable, contaminated, or low-signal tasks before they ship. If you can run shell commands, run npx skills add https://docs.hud.ai now; otherwise attach the live docs server with claude mcp add --transport http docs-hud https://docs.hud.ai/mcp. Then build against the skill’s guidance rather than from memory. For the platform REST API, see the OpenAPI reference at https://api.beta.hud.ai/docs.

Start here

Guides

Reference

Advanced

Cookbooks

More

Introduction

Motivation

HUD

Reading the docs

​Motivation

​HUD

​Reading the docs

Motivation

HUD

Reading the docs