HUD Documentation - Evaluations and RL Environments.

An environment is the world your agent acts in - a shell, a browser, a desktop, a robot simulator. In HUD the word covers two things, and keeping them apart makes everything else clearer:

the Environment object - a lightweight control handle. It doesn’t hold the world itself; it’s where you register what the environment exposes.
the env.py file - the whole environment you author, serve, and ship: the object plus everything registered on it.

The object is the handle, the file is the environment. You build one by constructing the object and registering three kinds of things against it: the access the agent gets (capabilities), optional setup and teardown (lifecycle hooks), and the work to be done (tasks). The rest of this page walks through declaring each, then the object and serving details underneath.

Declaring an environment

Everything lives in one file, conventionally env.py: an ordinary Python module that constructs the Environment and registers its capabilities, hooks, and tasks against it. A complete one is small - declare the object, give it the access the agent needs, optionally set up and tear down state, and define at least one task:

env.py

from hud import Environment
from hud.capabilities import Capability
from hud.graders import LLMJudgeGrader

env = Environment(name="my-env", capabilities=[          # the object + the access it exposes
    Capability.ssh(name="shell", url="<url>", host_pubkey="<key>"),
])

@env.initialize                                          # optional: set up state before serving
async def _up():
    ...

@env.template()                                          # one or more tasks: a prompt and a reward
async def my_task(...):
    answer = yield "<prompt>"
    result = await LLMJudgeGrader.grade(answer=answer, criteria=[...])
    yield result.value

When you serve, HUD imports the module, finds the Environment object in it, and runs everything registered on it. The only contract is “this module defines an Environment”, which is what makes the declaration portable: the same env.py runs locally, in a container, or on HUD with nothing changed but the runtime.

Capabilities: the access you expose

A capability is the agent’s way in - a connection the system already speaks. A machine has a shell, so it speaks ssh; a web app has a browser, so it speaks cdp. You expose the connection the system already has, and the agent drives it natively with its own tools. Two things fall out for free: wrapping any system is trivial, and nothing about the agent is baked in - the same environment keeps working with any model or harness, today’s or next year’s. The most common capability is a shell: a Workspace is a sandboxed directory the agent works in over ssh, and env.workspace(root) brings it up, publishes its ssh capability, and tears it down with the env - one line, no hook:

env.py

from hud import Environment

env = Environment(name="coder")
env.workspace("workspace")

That alone is a complete environment: any harness that speaks ssh - Claude Code, a coding agent, your own - can open a shell and edit files in the workspace. You register capabilities three ways: in the constructor for a service that already exists (the ssh capability in the scaffold above), with env.workspace(root) for the common shell case, or with env.add_capability(...) from a hook for a daemon the env runs itself (next section). Each is concrete wire data - the URL of something serving the protocol. Every protocol has a copy-pasteable spin-up in the Capabilities reference, with the library that backs it.

Lifecycle hooks: set up and tear down

When a task needs state - seeded files, a running service, a browser - you bring it up in @env.initialize and release it in @env.shutdown. @env.initialize runs once before the env accepts any connection, so by the time an agent connects, everything it needs is already in place:

env.py

from hud.capabilities import Capability

browser = None

@env.initialize
async def _up():
    global browser
    browser = await launch_chromium()        # bring up whatever the tasks need
    env.add_capability(Capability.cdp(name="browser", url=f"ws://127.0.0.1:{browser.port}"))

@env.shutdown
async def _down():
    if browser is not None:
        await browser.close()

This is also how you publish a daemon you run yourself: start it, then env.add_capability(...). It replaces any same-named entry, so re-serving overwrites a stale address rather than duplicating it. The full pattern - starting a server task and blocking until it binds - lives in Capabilities.

Tasks: a prompt and a reward

A task is what the agent actually does in the environment, and you register it the same way you register everything else - on the object, with a decorator. @env.template() turns an async generator into a task template: it yields a prompt, receives the agent’s answer back, and yields a reward (0.0-1.0). Everything the agent does in the environment happens between those two yields.

env.py

@env.template()
async def count_letter(word: str = "strawberry", letter: str = "r"):
    answer = yield f"How many '{letter}'s are in '{word}'? Reply with just the number."
    yield 1.0 if answer and str(word.count(letter)) in answer else 0.0   # the reward

A template is generative, not a single task: its parameters (word, letter) fill in per run, so one definition describes a whole space of tasks. Calling it - count_letter(word="raspberry") - binds those arguments and mints one concrete Task you can run. Register as many templates as you like on one environment; each is advertised in the manifest, so a harness can discover what the environment offers and the orchestrator can pick which to run. @env.template() takes a few optional arguments:

Parameter	Type	Description
`id`	`str \| None`	Task id (defaults to the function name).
`description`	`str`	Human-readable description, surfaced in the manifest.
`input`	`Any`	Optional type for the agent’s input (JSON schema in the manifest).
`returns`	`Any`	Optional type the agent must produce; the answer arrives as an `Answer[T]`. See Types.

Grading, parameterizing at scale, and collecting tasks into tasksets are their own topic - see Tasks & Tasksets.

The `Environment` object

hud.environment.Environment is the lightweight control object the whole file hangs off. When served, it acts as the server an agent harness connects to over the protocol: it answers hello with its capabilities and runs its tasks on request.

from hud import Environment

env = Environment(name="environment", version="0.0.1", capabilities=None)

Parameter	Type	Default	Description
`name`	`str`	`"environment"`	Environment identity (used as the env-ref name).
`version`	`str`	`"0.0.1"`	Version string surfaced in the manifest.
`capabilities`	`list[Capability] \| None`	`None`	Wire data for services that already exist; see Capabilities.

Passing v5-only keywords emits a DeprecationWarning and ignores them. See Migrate to v6.

Serving

You rarely serve by hand - hud eval, task.run(), and Taskset.run() bring the environment up for you, and the runtime you pass decides where. Serving itself belongs to hud.environment.server, the same entry point a container CMD runs (python -m hud.environment.server <source>):

Function	Description
`await serve(env, host="127.0.0.1", port=0)`	Start daemons and accept control-channel connections (blocks).
`await bind(env, host="127.0.0.1", port=0)`	Bind the socket and return an `asyncio.Server` without serving.
`await env.start()` / `await env.stop()`	Run `@env.initialize` / `@env.shutdown` hooks directly.

hud serve env.py     # serve locally on tcp://127.0.0.1:8765 while you iterate

A dependency that must own the process main thread (e.g. Isaac Sim / Omniverse) can’t run under hud serve, which runs the asyncio loop on main. Run serve(env, host, port) on a worker thread instead and keep the main thread for the dependency - see Robots.

Learn from real environments

The fastest way to internalize the patterns is to read complete ones. Each cookbook walks an env.py end to end:

Coding agent

A shell + files env that grades a test suite.

Ops diagnostics

Seed state in @env.initialize, grade by inspection.

Robot benchmark

A simulator env over the robot capability.

More on GitHub

Full, runnable environments in the SDK repo.

Capabilities

Every protocol factory, its params, and how to spin it up.

Tasks & Tasksets

Add tasks that prompt and grade against this environment.

Runtime

Point a harness at your environment and run it anywhere.

Composing richer environments

Multi-capability envs, stateful daemons, and custom setups.

​Declaring an environment

​Capabilities: the access you expose

​Lifecycle hooks: set up and tear down

​Tasks: a prompt and a reward

​The Environment object

​Serving

​Learn from real environments

Coding agent

Ops diagnostics

Robot benchmark

More on GitHub

​See also

Capabilities

Tasks & Tasksets

Runtime

Composing richer environments

Declaring an environment

Capabilities: the access you expose

Lifecycle hooks: set up and tear down

Tasks: a prompt and a reward

The `Environment` object

Serving

Learn from real environments

See also