HUD Documentation — Evaluations and RL Environments.

A capability is a connection the environment exposes; a harness attaches its own tools to it. The same environment serves a one-shot Q&A or a full computer-use rollout, depending on which capabilities a harness opens.

Protocol	Wire id	What it exposes	Spun up with
`ssh`	`ssh/2`	Shell + files (bash, SFTP) in a sandboxed workspace	`Workspace` (built in)
`mcp`	`mcp/2025-11-25`	Your own tools over the Model Context Protocol	`fastmcp`
`cdp`	`cdp/1.3`	Browser control over the Chrome DevTools Protocol	Chromium (`playwright`)
`rfb`	`rfb/3.8`	Full computer-use over VNC: screen + keyboard/mouse	`Xvfb` + `x11vnc`
`robot`	`openpi/0`	Schema-driven robot observation/action loop over WebSocket (beta)	robot bridge

from hud.capabilities import Capability

The `Capability` dataclass

A capability is (name, protocol, url, params) — concrete wire data carrying the real address of something serving the protocol.

Field	Type	Description
`name`	`str`	Capability name (e.g. `"shell"`, `"browser"`).
`protocol`	`str`	Wire protocol id (e.g. `"ssh/2"`).
`url`	`str`	Connection URL.
`params`	`dict`	Protocol-specific connection params.

Each protocol has a factory (Capability.ssh, .mcp, .cdp, .rfb, .robot) that normalizes the URL and fills defaults; cap.to_manifest() / Capability.from_manifest(data) round-trip it.

Spinning up a capability

Every capability points at a daemon. For one that already exists, pass the factory to the constructor. For a daemon the environment runs itself, the pattern is always the same: start it in @env.initialize, block until it’s listening, publish its address with env.add_capability(...), and tear it down in @env.shutdown. The env doesn’t accept a client connection until every initialize hook returns, so waiting for the port closes the startup race. A small readiness helper the snippets below reuse:

import asyncio
import socket

async def _listening(host: str, port: int, timeout: float = 15.0) -> None:
    """Block until host:port accepts a connection — call before publishing."""
    loop = asyncio.get_running_loop()
    deadline = loop.time() + timeout
    while loop.time() < deadline:
        try:
            socket.create_connection((host, port), timeout=0.5).close()
            return
        except OSError:
            await asyncio.sleep(0.1)
    raise RuntimeError(f"nothing listening on {host}:{port}")

Bind every daemon to 127.0.0.1: a loopback capability is forwarded through the env’s one control port (see Bindings are always reachable), so nothing else needs publishing.

`ssh` — a sandboxed shell

The shell case is built in. A Workspace is a sandboxed directory the agent gets over ssh; env.workspace(root) starts it, publishes its ssh capability, and stops it with the env — one line, no hook:

env.py

from hud.environment import Environment

env = Environment(name="coder")
env.workspace("workspace")   # publishes "shell" (ssh/2) when the env serves

Use a relative path ("workspace", created next to env.py). Sandbox isolation (bwrap) is Linux-only — unisolated elsewhere, isolated in a built image.

To run a workspace yourself, drive its lifecycle and publish ws.capability() by hand:

env.py

from hud.environment import Environment, Workspace

env = Environment(name="coder")
ws = Workspace("workspace", host="127.0.0.1", port=0)   # port 0 → ephemeral

@env.initialize
async def _up():
    await ws.start()                          # binds, generates keys; idempotent
    env.add_capability(ws.capability("shell"))

@env.shutdown
async def _down():
    await ws.stop()

`mcp` — your own tools

Serve bespoke tools on a FastMCP server. The streamable-HTTP transport serves under /mcp, so that path is part of the published URL:

env.py

import asyncio

from fastmcp import FastMCP

from hud.capabilities import Capability
from hud.environment import Environment

server = FastMCP(name="tools")

@server.tool
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

env = Environment(name="calc")
_task: asyncio.Task | None = None

@env.initialize
async def _up():
    global _task
    if _task is None:                          # idempotent
        _task = asyncio.create_task(
            server.run_async(transport="http", host="127.0.0.1", port=8040)
        )
        await _listening("127.0.0.1", 8040)
    env.add_capability(Capability.mcp(name="tools", url="http://127.0.0.1:8040/mcp"))

@env.shutdown
async def _down():
    global _task
    if _task is not None:
        _task.cancel()
        _task = None

Capability.mcp accepts ws/wss/http/https URLs (no stdio) and an optional auth_token=.

`cdp` — a browser

Launch Chromium with a DevTools port. Playwright ships the binary (playwright install chromium); run it as a subprocess so the CDP endpoint is reachable at http://127.0.0.1:9222:

env.py

import asyncio
import tempfile

from playwright.async_api import async_playwright

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="browser")
_proc: asyncio.subprocess.Process | None = None

@env.initialize
async def _up():
    global _proc
    if _proc is None:
        pw = await async_playwright().start()
        _proc = await asyncio.create_subprocess_exec(
            pw.chromium.executable_path,
            "--headless=new",
            "--remote-debugging-port=9222",
            "--remote-debugging-address=127.0.0.1",
            "--no-first-run",
            "--user-data-dir=" + tempfile.mkdtemp(prefix="cdp_"),
        )
        await _listening("127.0.0.1", 9222)
    env.add_capability(Capability.cdp(name="browser", url="http://127.0.0.1:9222"))

@env.shutdown
async def _down():
    global _proc
    if _proc is not None:
        _proc.terminate()
        await _proc.wait()
        _proc = None

Capability.cdp defaults to port 9222 and takes an optional target_id=. (Add --no-sandbox only when running as root in a container.)

`rfb` — a virtual screen

Full computer-use is a VNC server over a virtual display. On Linux, Xvfb paints the framebuffer and x11vnc serves it (apt install xvfb x11vnc):

env.py

import asyncio

from hud.capabilities import Capability
from hud.environment import Environment

env = Environment(name="desktop")
_procs: tuple | None = None

@env.initialize
async def _up():
    global _procs
    if _procs is None:
        xvfb = await asyncio.create_subprocess_exec(
            "Xvfb", ":0", "-screen", "0", "1280x1024x24",
        )
        await asyncio.sleep(0.5)               # let the X server come up first
        vnc = await asyncio.create_subprocess_exec(
            "x11vnc", "-display", ":0", "-rfbport", "5900",
            "-localhost", "-forever", "-nopw",
        )
        await _listening("127.0.0.1", 5900)
        _procs = (xvfb, vnc)
    env.add_capability(Capability.rfb(name="screen", url="rfb://127.0.0.1", display=0))

@env.shutdown
async def _down():
    global _procs
    if _procs:
        for p in reversed(_procs):
            p.terminate()
            await p.wait()
        _procs = None

Capability.rfb listens on 5900 + display and takes an optional password=. Host multiple screens by publishing one rfb capability per display.

`Capability.robot`

Capability.robot(*, name="robot", url, contract)

The openpi/0 control loop (beta). This is an openpi-like protocol: it reuses openpi’s wire format (msgpack with transparent, recursive numpy serialization) and its flat observation/action naming schema (observation/... keys, actions), so an openpi policy server and a HUD env speak the same bytes. It differs fundamentally in role assignment — in openpi a policy server answers inference requests; here the environment is the server (it owns the world and pushes observations) and the agent is the client (it acts in the world, replying with actions). contract is the environment’s full self-describing schema — robot_type, control_rate, and every observation/action feature — carried in the manifest params so the agent wires itself with no shared config. The serving bridge binds an ephemeral loopback port, so publish this from an @env.initialize hook after await bridge.start():

@env.initialize
async def _up():
    await bridge.start()
    env.add_capability(Capability.robot(name="robot", url=bridge.url, contract=CONTRACT))

See Robots for the bridge, the harness, and the contract spec.

Workspace

Workspace is the standard shell daemon: a directory plus a bwrap-isolated SSH server (bash + chroot’d SFTP). Attach one with env.workspace(root, ...) and the environment brings it up (keys, socket, accept loop) when it serves, tearing it down on env.stop(). Extra kwargs configure the workspace — mounts, network, env vars, guest path, fixed ports, your own keys:

from hud.environment import Environment, Mount

env = Environment(name="coder")
env.workspace(
    "/workspace",
    network=True,
    mounts=[Mount("ro", src="/data", dst="/data")],
)

To run one yourself (outside an env), drive the lifecycle directly and publish ws.capability() as a concrete ssh capability:

Member	Description
`Workspace(root, *, host="127.0.0.1", port=0, mounts=(), network=False, env=None, user="agent", ...)`	Construct. `port=0` binds an ephemeral port.
`await ws.start()`	Start the SSH accept loop (idempotent).
`ws.capability(name="shell")`	The resolved `ssh` `Capability` (materializes keys, binds the socket).
`await ws.stop()`	Stop accepting sessions and release the socket.
`ws.ssh_url` / `ws.ssh_host_pubkey`	Connection address and host key.
`ws.bwrap_available`	Whether `bwrap` isolation is active.

Pass mounts=[Mount("ro", src=..., dst=...)] and network=True (both from hud.environment) to configure the sandbox.

Bindings are always reachable

Every address in the manifest is dialable from where the client runs. A loopback daemon (a workspace, a browser in the same container) is transparently forwarded through the env’s control port, so a container only ever publishes one port — bind your daemons to 127.0.0.1 and don’t worry about the rest.

Harness clients

A harness opens a capability to get a live client. The capability clients live in hud.capabilities:

Client	Protocol
`SSHClient`	`ssh/2` (raw `asyncssh` connection via `.conn`)
`MCPClient`	`mcp/2025-11-25`
`CDPClient`	`cdp/1.3`
`RFBClient`	`rfb/3.8`
`RobotClient`	`openpi/0` — joins the registry on first open (the `robot` extra: numpy/openpi-client)

The bundled provider agents open these automatically based on which capabilities the manifest advertises (see Agents). To write your own harness, attach to the capability you need and define your tool spec.

Capabilities

The `Capability` dataclass

Spinning up a capability

`ssh` — a sandboxed shell

`mcp` — your own tools

`cdp` — a browser

`rfb` — a virtual screen

`Capability.robot`

Workspace

Bindings are always reachable

Harness clients

See also

Environments

Environment reference

Agents

Tasks & Tasksets

​The Capability dataclass

​Spinning up a capability

​ssh — a sandboxed shell

​mcp — your own tools

​cdp — a browser

​rfb — a virtual screen

​Capability.robot

​Workspace

​Bindings are always reachable

​Harness clients

​See also

Environments

Environment reference

Agents

Tasks & Tasksets

The `Capability` dataclass

Spinning up a capability

`ssh` — a sandboxed shell

`mcp` — your own tools

`cdp` — a browser

`rfb` — a virtual screen

`Capability.robot`

Workspace

Bindings are always reachable

Harness clients

See also