HUD Documentation — Evaluations and RL Environments.

An environment is where the agent acts. Everything an agent needs from one is access — a way to act on the system — so that’s all an environment exposes: a capability, a connection the system already speaks.

Capability	What it exposes
`ssh`	Shell + files (bash, SFTP) in a sandboxed workspace
`mcp`	Tools over the Model Context Protocol
`cdp`	Browser control over the Chrome DevTools Protocol
`rfb`	Full computer-use over VNC: screen + keyboard/mouse
`robot`	Schema-driven robot observation/action loop over WebSocket (beta)

A machine has a shell, so it speaks ssh; a web app has a browser, so it speaks cdp. You expose the connection the system already has — no action schema to invent — and the agent drives it natively with its own tools. Two things fall out for free: wrapping any system is trivial, and nothing about the agent is baked in, so the same environment keeps working with any model or harness, today’s or next year’s.

A shell environment

The most common capability is a shell. A Workspace is a sandboxed directory the agent works in over ssh; env.workspace(root) brings it up, publishes its ssh capability, and tears it down with the env — one line, no hook:

env.py

from hud.environment import Environment

env = Environment(name="coder")
env.workspace("workspace")

That’s a complete environment. Any harness that speaks ssh — Claude Code, a coding agent, your own — can now open a shell and edit files in the workspace.

Other capabilities

Every other protocol — mcp (your own tools), cdp (browser), rfb (computer-use), robot (robot policies) — is a daemon you run and publish. The Capabilities reference has a working, copy-pasteable spin-up for each, with the library that backs it.

Spin up any capability

Tested examples for ssh, mcp, cdp, rfb, and robot — each with the library it needs and the lifecycle wired up.

Lifecycle hooks

A daemon the env runs itself publishes its address when the env starts. Bring it up in @env.initialize and publish it with env.add_capability(...); tear it down in @env.shutdown:

env.py

from hud.capabilities import Capability

browser = None

@env.initialize
async def _up():
    global browser
    browser = await launch_chromium()        # bring up whatever your tasks need
    env.add_capability(Capability.cdp(name="browser", url=f"ws://127.0.0.1:{browser.port}"))

@env.shutdown
async def _down():
    if browser is not None:
        await browser.close()

@env.initialize runs once before the env accepts connections; @env.shutdown runs on stop. env.add_capability replaces any same-named entry, so re-serving overwrites a stale address rather than duplicating it. For the full pattern — starting a server task and blocking until it binds — see Capabilities.

Serving the environment

An environment serves a tcp control channel. Three ways to bring it up:

hud serve

hud serve env.py serves locally on tcp://127.0.0.1:8765 while you iterate.

hud deploy

Builds and publishes the environment to HUD infra in one step.

env.serve()

await env.serve("127.0.0.1", 8765) is the in-code equivalent.

You rarely call serve yourself — hud eval and task.run() bring the environment up for you (see Tasks).

Next steps

Tasks, tasksets & grading

Add tasks that prompt and grade against this environment.

Capabilities reference

Every protocol factory and its params.

Run on any model

Point a harness at the capabilities you declared.

Deploy & scale

Package once, run anywhere.

​A shell environment

​Other capabilities