| Protocol | Wire id | What it exposes | Spun up with |
|---|---|---|---|
ssh | ssh/2 | Shell + files (bash, SFTP) in a sandboxed workspace | Workspace (built in) |
mcp | mcp/2025-11-25 | Your own tools over the Model Context Protocol | fastmcp |
cdp | cdp/1.3 | Browser control over the Chrome DevTools Protocol | Chromium (playwright) |
rfb | rfb/3.8 | Full computer-use over VNC: screen + keyboard/mouse | Xvfb + x11vnc |
robot | openpi/0 | Schema-driven robot observation/action loop over WebSocket (beta) | robot bridge |
The Capability dataclass
A capability is (name, protocol, url, params) - concrete wire data carrying the real address of something serving the protocol.
| Field | Type | Description |
|---|---|---|
name | str | Capability name (e.g. "shell", "browser"). |
protocol | str | Wire protocol id (e.g. "ssh/2"). |
url | str | Connection URL. |
params | dict | Protocol-specific connection params. |
Capability.ssh, .mcp, .cdp, .rfb, .robot) - a classmethod that builds a valid Capability for that protocol, so you don’t need to fill in the name, protocol, url, and params fields by hand. It normalizes the URL (fills in the default scheme and port), sets the right protocol id, and packs the protocol-specific params (e.g. host_pubkey for ssh, display for rfb). cap.to_manifest() / Capability.from_manifest(data) round-trip it on the wire.
Spinning up a capability
Every capability points at a daemon. If the daemon already exists (a managed service, a remote box), just describe it with its factory and you’re done. The case worth a closer look is a daemon the environment runs itself - an MCP server, a browser, a VNC display. The flow is the same four steps every time:env.py
@env.initialize hook to completion before it accepts a single client, so blocking here is what guarantees the capability is live the moment any agent connects. The robust way is to poll the port in a loop until it answers (as the example envs do); a brief asyncio.sleep is fine for a daemon you know starts fast.
Bind to 127.0.0.1 (step 1 and 3). Bind every daemon to 127.0.0.1 so it’s only reachable from inside the environment - that’s exactly what you want, because the environment exposes a single control port and nothing else. The HUD client transparently forwards a 127.0.0.1 capability through that one control port to the daemon inside; a capability that’s already on a public address is used as-is. So you bind, publish, and never think about networking - one port in, every capability reachable.
ssh - a sandboxed shell
The shell case is built in via Workspace - a built-in daemon that manages a bwrap-isolated directory and serves it over ssh. env.workspace(root) starts it, publishes its ssh capability, and stops it with the env - one line, no hook:
env.py
Use a relative path (
"workspace", created next to env.py). Sandbox isolation (bwrap) is Linux-only - unisolated elsewhere, isolated in a built image.ws.capability() by hand:
env.py
mcp - your own tools
Serve bespoke tools on a FastMCP server. The streamable-HTTP transport serves under /mcp, so that path is part of the published URL:
env.py
Capability.mcp accepts ws/wss/http/https URLs (no stdio) and an optional auth_token=.
cdp - a browser
Launch Chromium with a DevTools port. Playwright ships the binary (playwright install chromium); run it as a subprocess so the CDP endpoint is reachable at http://127.0.0.1:9222:
env.py
Capability.cdp defaults to port 9222 and takes an optional target_id=. (Add --no-sandbox only when running as root in a container.)
rfb - a virtual screen
Full computer-use is a VNC server over a virtual display. On Linux, Xvfb paints the framebuffer and x11vnc serves it (apt install xvfb x11vnc):
env.py
Capability.rfb listens on 5900 + display and takes an optional password=. Host multiple screens by publishing one rfb capability per display.
robot - an observation/action loop
openpi/0 wire protocol. It’s an openpi-like protocol: it reuses openpi’s wire format (msgpack with recursive numpy serialization) and its flat observation/action naming (observation/... keys, actions), so an openpi policy server and a HUD env speak the same bytes. The one fundamental difference is role assignment - in openpi a policy server answers inference requests, but here the environment is the server (it owns the world and pushes observations) and the agent is the client (it acts, replying with actions).
The contract is the environment’s full self-describing schema - robot_type, control_rate, and every observation/action feature - carried in the manifest so the agent wires itself with no shared config. The environment drives its simulator through a RobotEndpoint (not the bridge directly, although possible), and the endpoint builds the capability for you once started:
Harness clients
Spinning up a capability is the environment side. The harness side is the mirror: it opens a capability to get a live client it can drive. The capability clients live inhud.capabilities:
| Client | Protocol |
|---|---|
SSHClient | ssh/2 (raw asyncssh connection via .conn) |
MCPClient | mcp/2025-11-25 |
CDPClient | cdp/1.3 |
RFBClient | rfb/3.8 |
RobotClient | openpi/0 - joins the registry on first open (the robot extra: numpy/openpi-client) |
Workspace
AWorkspace is not a capability - it’s the built-in daemon that serves the ssh capability. It’s the one capability HUD ships an implementation for; for mcp, cdp, and rfb you stand up the daemon yourself (above), but for a shell you just attach a workspace.
Concretely it’s a directory plus a bwrap-isolated SSH server (bash + chroot’d SFTP). env.workspace(root, ...) wires its whole lifecycle: the environment brings it up (keys, socket, accept loop) when it serves and tears it down on env.stop(). Extra kwargs configure the sandbox - mounts, network, env vars, guest path, fixed ports, your own keys:
ws.capability() as a concrete ssh capability:
| Member | Description |
|---|---|
Workspace(root, *, host="127.0.0.1", port=0, mounts=(), network=False, env=None, user="agent", ...) | Construct. port=0 binds an ephemeral port. |
await ws.start() | Start the SSH accept loop (idempotent). |
ws.capability(name="shell") | The resolved ssh Capability (materializes keys, binds the socket). |
await ws.stop() | Stop accepting sessions and release the socket. |
ws.ssh_url / ws.ssh_host_pubkey | Connection address and host key. |
ws.bwrap_available | Whether bwrap isolation is active. |