HUD Documentation - Evaluations and RL Environments.

A capability is a connection the environment exposes; a harness attaches its own tools to it. The same environment serves a one-shot Q&A or a full computer-use rollout, depending on which capabilities a harness opens.

Protocol	Wire id	What it exposes	Spun up with
`ssh`	`ssh/2`	Shell + files (bash, SFTP) in a sandboxed workspace	`Workspace` (built in)
`mcp`	`mcp/2025-11-25`	Your own tools over the Model Context Protocol	`fastmcp`
`cdp`	`cdp/1.3`	Browser control over the Chrome DevTools Protocol	Chromium (`playwright`)
`rfb`	`rfb/3.8`	Full computer-use over VNC: screen + keyboard/mouse	`Xvfb` + `x11vnc`
`robot`	`openpi/0`	Schema-driven robot observation/action loop over WebSocket (beta)	robot bridge

from hud.capabilities import Capability

The `Capability` dataclass

A capability is (name, protocol, url, params) - concrete wire data carrying the real address of something serving the protocol.

Field	Type	Description
`name`	`str`	Capability name (e.g. `"shell"`, `"browser"`).
`protocol`	`str`	Wire protocol id (e.g. `"ssh/2"`).
`url`	`str`	Connection URL.
`params`	`dict`	Protocol-specific connection params.

Each protocol has a factory (Capability.ssh, .mcp, .cdp, .rfb, .robot) - a classmethod that builds a valid Capability for that protocol, so you don’t need to fill in the name, protocol, url, and params fields by hand. It normalizes the URL (fills in the default scheme and port), sets the right protocol id, and packs the protocol-specific params (e.g. host_pubkey for ssh, display for rfb). cap.to_manifest() / Capability.from_manifest(data) round-trip it on the wire.

Spinning up a capability

Every capability points at a daemon. If the daemon already exists (a managed service, a remote box), just describe it with its factory and you’re done. The case worth a closer look is a daemon the environment runs itself - an MCP server, a browser, a VNC display. The flow is the same four steps every time:

env.py

@env.initialize
async def _up():
    start_daemon(host="127.0.0.1", port=PORT)            # 1. launch it (subprocess / task)
    await wait_until_listening("127.0.0.1", PORT)         # 2. block until it accepts connections
    env.add_capability(Capability.mcp(name="tools",      # 3. publish its address
                                      url=f"http://127.0.0.1:{PORT}/mcp"))

@env.shutdown
async def _down():
    stop_daemon()                                        # 4. tear it down with the env

Wait until it’s actually listening (step 2). Launching a subprocess or background task returns before the daemon has bound its port - publish the capability now and an agent can connect before anything is there to answer. The environment runs every @env.initialize hook to completion before it accepts a single client, so blocking here is what guarantees the capability is live the moment any agent connects. The robust way is to poll the port in a loop until it answers (as the example envs do); a brief asyncio.sleep is fine for a daemon you know starts fast. Bind to 127.0.0.1 (step 1 and 3). Bind every daemon to 127.0.0.1 so it’s only reachable from inside the environment - that’s exactly what you want, because the environment exposes a single control port and nothing else. The HUD client transparently forwards a 127.0.0.1 capability through that one control port to the daemon inside; a capability that’s already on a public address is used as-is. So you bind, publish, and never think about networking - one port in, every capability reachable.

`ssh` - a sandboxed shell

The shell case is built in via Workspace - a built-in daemon that manages a bwrap-isolated directory and serves it over ssh. env.workspace(root) starts it, publishes its ssh capability, and stops it with the env - one line, no hook:

env.py

from hud.environment import Environment

env = Environment(name="coder")
env.workspace("workspace")   # publishes "shell" (ssh/2) when the env serves

Use a relative path ("workspace", created next to env.py). Sandbox isolation (bwrap) is Linux-only - unisolated elsewhere, isolated in a built image.

To run a workspace yourself, drive its lifecycle and publish ws.capability() by hand:

env.py

from hud.environment import Environment, Workspace

env = Environment(name="coder")
ws = Workspace("workspace", host="127.0.0.1", port=0)   # port 0 → ephemeral

@env.initialize
async def _up():
    await ws.start()                          # binds, generates keys; idempotent
    env.add_capability(ws.capability("shell"))

@env.shutdown
async def _down():
    await ws.stop()

`mcp` - your own tools

Serve bespoke tools on a FastMCP server. The streamable-HTTP transport serves under /mcp, so that path is part of the published URL:

env.py

import asyncio

from fastmcp import FastMCP

from hud.capabilities import Capability
from hud.environment import Environment

server = FastMCP(name="tools")

@server.tool
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

env = Environment(name="calc")
_task: asyncio.Task | None = None

@env.initialize
async def _up():
    global _task
    if _task is None:                          # idempotent
        _task = asyncio.create_task(
            server.run_async(transport="http", host="127.0.0.1", port=8040)
        )
        await asyncio.sleep(1.0)               # wait until the server is ready
    env.add_capability(Capability.mcp(name="tools", url="http://127.0.0.1:8040/mcp"))

@env.shutdown
async def _down():
    global _task
    if _task is not None:
        _task.cancel()
        _task = None

Capability.mcp accepts ws/wss/http/https URLs (no stdio) and an optional auth_token=.

`cdp` - a browser

Launch Chromium with a DevTools port. Playwright ships the binary (playwright install chromium); run it as a subprocess so the CDP endpoint is reachable at http://127.0.0.1:9222:

env.py

import asyncio
import tempfile

from playwright.async_api import async_playwright

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="browser")
_proc: asyncio.subprocess.Process | None = None

@env.initialize
async def _up():
    global _proc
    if _proc is None:
        pw = await async_playwright().start()
        _proc = await asyncio.create_subprocess_exec(
            pw.chromium.executable_path,
            "--headless=new",
            "--remote-debugging-port=9222",
            "--remote-debugging-address=127.0.0.1",
            "--no-first-run",
            "--user-data-dir=" + tempfile.mkdtemp(prefix="cdp_"),
        )
        await asyncio.sleep(1.0)               # wait until Chromium is ready
    env.add_capability(Capability.cdp(name="browser", url="http://127.0.0.1:9222"))

@env.shutdown
async def _down():
    global _proc
    if _proc is not None:
        _proc.terminate()
        await _proc.wait()
        _proc = None

Capability.cdp defaults to port 9222 and takes an optional target_id=. (Add --no-sandbox only when running as root in a container.)

`rfb` - a virtual screen

Full computer-use is a VNC server over a virtual display. On Linux, Xvfb paints the framebuffer and x11vnc serves it (apt install xvfb x11vnc):

env.py

import asyncio

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="desktop")
_procs: tuple | None = None

@env.initialize
async def _up():
    global _procs
    if _procs is None:
        xvfb = await asyncio.create_subprocess_exec(
            "Xvfb", ":0", "-screen", "0", "1280x1024x24",
        )
        await asyncio.sleep(0.5)               # let the X server come up first
        vnc = await asyncio.create_subprocess_exec(
            "x11vnc", "-display", ":0", "-rfbport", "5900",
            "-localhost", "-forever", "-nopw",
        )
        await asyncio.sleep(1.0)               # wait until VNC is ready
        _procs = (xvfb, vnc)
    env.add_capability(Capability.rfb(name="screen", url="rfb://127.0.0.1", display=0))

@env.shutdown
async def _down():
    global _procs
    if _procs:
        for p in reversed(_procs):
            p.terminate()
            await p.wait()
        _procs = None

Capability.rfb listens on 5900 + display and takes an optional password=. Host multiple screens by publishing one rfb capability per display.

`robot` - an observation/action loop

Capability.robot(*, name="robot", url, contract)

The robot control loop (beta), carried over the openpi/0 wire protocol. It’s an openpi-like protocol: it reuses openpi’s wire format (msgpack with recursive numpy serialization) and its flat observation/action naming (observation/... keys, actions), so an openpi policy server and a HUD env speak the same bytes. The one fundamental difference is role assignment - in openpi a policy server answers inference requests, but here the environment is the server (it owns the world and pushes observations) and the agent is the client (it acts, replying with actions). The contract is the environment’s full self-describing schema - robot_type, control_rate, and every observation/action feature - carried in the manifest so the agent wires itself with no shared config. The environment drives its simulator through a RobotEndpoint (not the bridge directly, although possible), and the endpoint builds the capability for you once started:

endpoint = RobotEndpoint(MySimBridge())   # drive the sim only through the endpoint

@env.initialize
async def _up():
    await endpoint.start()
    env.add_capability(await endpoint.capability(contract=CONTRACT))

See Robots for the bridge, the endpoint, the harness, and the contract spec.

Harness clients

Spinning up a capability is the environment side. The harness side is the mirror: it opens a capability to get a live client it can drive. The capability clients live in hud.capabilities:

Client	Protocol
`SSHClient`	`ssh/2` (raw `asyncssh` connection via `.conn`)
`MCPClient`	`mcp/2025-11-25`
`CDPClient`	`cdp/1.3`
`RFBClient`	`rfb/3.8`
`RobotClient`	`openpi/0` - joins the registry on first open (the `robot` extra: numpy/openpi-client)

The bundled provider agents open these automatically based on which capabilities the manifest advertises (see Agents). To write your own harness, attach to the capability you need and define your tool spec.

Workspace

A Workspace is not a capability - it’s the built-in daemon that serves the ssh capability. It’s the one capability HUD ships an implementation for; for mcp, cdp, and rfb you stand up the daemon yourself (above), but for a shell you just attach a workspace. Concretely it’s a directory plus a bwrap-isolated SSH server (bash + chroot’d SFTP). env.workspace(root, ...) wires its whole lifecycle: the environment brings it up (keys, socket, accept loop) when it serves and tears it down on env.stop(). Extra kwargs configure the sandbox - mounts, network, env vars, guest path, fixed ports, your own keys:

from hud.environment import Environment, Mount

env = Environment(name="coder")
env.workspace(
    "/workspace",
    network=True,
    mounts=[Mount("ro", src="/data", dst="/data")],
)

To run one outside an env, drive its lifecycle directly and publish ws.capability() as a concrete ssh capability:

Member	Description
`Workspace(root, *, host="127.0.0.1", port=0, mounts=(), network=False, env=None, user="agent", ...)`	Construct. `port=0` binds an ephemeral port.
`await ws.start()`	Start the SSH accept loop (idempotent).
`ws.capability(name="shell")`	The resolved `ssh` `Capability` (materializes keys, binds the socket).
`await ws.stop()`	Stop accepting sessions and release the socket.
`ws.ssh_url` / `ws.ssh_host_pubkey`	Connection address and host key.
`ws.bwrap_available`	Whether `bwrap` isolation is active.

Capabilities

The `Capability` dataclass

Spinning up a capability

`ssh` - a sandboxed shell

`mcp` - your own tools

`cdp` - a browser

`rfb` - a virtual screen

`robot` - an observation/action loop

Harness clients

Workspace

See also

Environment

Agents

Tasks & Tasksets

​The Capability dataclass

​Spinning up a capability

​ssh - a sandboxed shell

​mcp - your own tools

​cdp - a browser

​rfb - a virtual screen

​robot - an observation/action loop

​Harness clients

​Workspace

​See also

Environment

Agents

Tasks & Tasksets

The `Capability` dataclass

Spinning up a capability

`ssh` - a sandboxed shell

`mcp` - your own tools

`cdp` - a browser

`rfb` - a virtual screen

`robot` - an observation/action loop

Harness clients

Workspace

See also