The
robot capability is in beta. The wire protocol is versioned openpi/0; the contract schema is v0. Expect additive changes while the design settles.Run — but a policy at 10 Hz can’t ride discrete tool calls. The robot capability is a schema-driven observation/action loop over WebSocket. It is openpi-like — it reuses openpi’s wire format (msgpack with transparent, recursive numpy serialization) and flat observation/action naming (observation/... keys, actions) — but flips the roles: the environment is the server (owns the simulator, serves frames) and the agent is the client (runs the policy, streams actions back). On connect the env sends a metadata frame, then pushes observations; failures surface as a string traceback frame rather than a silent close.
Everything below ships behind the robot extra (pip install hud-python[robot] — numpy + openpi-client).
Overview
Integrating a policy against a robot environment means answering three questions: who owns the simulator, who runs the policy, and how do their spaces line up. The capability splits each answer into a small, named abstraction — implement the ones on your side, and the framework owns everything in between (the serve loop, the wire protocol, telemetry). Environment side — owns the simulator and serves frames:RobotBridge— the one class you implement around your sim:reset/step/get_observation. The framework owns the WebSocket serve loop and the single-agent connection.RobotEndpoint— wraps the bridge for task definitions: episode bookkeeping and results.
RobotAgent— the episode-loop harness: connect to the env, read its schema, thenobserve → infer → actuntil the env terminates.Model— the policy seam:infer(batch) -> action.LeRobotModelwraps a stock LeRobot checkpoint.Adapter— the space-translation seam between what the env emits and what the policy consumes.LeRobotAdaptercovers the common wiring.
RealtimeRobotBridge / RealtimeRobotAgent) for when the sim clock must not wait on inference — the env advances on its own wall clock while the agent streams action chunks asynchronously. These live in the experimental scaffolding (demos/experimental, outside the published SDK) so they can iterate independently.
The shape of the work follows from the split: a bridge is written once per environment, a model + adapter once per policy, and the contract tells you — before you run anything — whether a given pairing wires up. That’s the path from “new checkpoint” to “scored episodes on a benchmark” in an afternoon.
Environment side
You implement one class — the bridge owns the simulator; the framework owns the WebSocket serve loop and the single-agent connection:hud serve env.py, a container CMD, or LocalRuntime("env.py")).
A simulator that must own the process main thread (Isaac Sim / Omniverse) can’t run under
hud serve. Run the SDK server on a worker thread instead — asyncio.run(hud.environment.server.serve(env, host, port)) in a thread, with a custom SimRunner that pumps sim work back to the main thread.Agent side
The harness lives inhud.agents.robot. RobotAgent owns the episode loop — connect to the robot binding, read the contract, then observe → infer → act until the env terminates. You supply two seams:
Model— runs the policy (infer(batch) -> action).LeRobotModel(policy, preprocess, postprocess)ships the standard LeRobot inference sandwich.Adapter— translates env ↔ policy spaces.LeRobotAdapter(model_image_keys=...)maps the env’s cameras onto the policy’s image slots in contract order, converts HWC uint8 → CHW float, and passes state + prompt through.
Taskset(...).run(agent, runtime=...) — against any substrate serving the env.
The contract
Robot observation and action spaces differ immensely. Embodiments disagree on camera count, resolution, and naming; on state representation (joint angles vs. EEF pose, quaternions vs. axis-angle, world frame vs. base frame); on action semantics (absolute vs. delta, position vs. velocity); on control rate. Policies are just as opinionated about what they consume and emit. Pairing a specific model with a specific env therefore always involves a wiring step — and getting it silently wrong (a transposed image, a reordered state vector) produces a policy that runs fine and scores zero. The HUD robot spec exists to make that wiring explicit and checkable. Each environment carries a contract — a JSON document describing the embodiment:robot_type, control_rate, and a features map where each feature declares its role (observation / action), dtype, shape, and ordering:
RobotClient.spaces(), which splits features into action/observation spaces by role — this is what the Adapter wires against. The v0 schema is deliberately narrow: one embodiment, one observation space, one action space per contract, every feature rank ≥ 1 (scalars are [1]). The full authoring spec — closed symbol sets for state_type / state_representation / frame, conventions, and the known traps — lives outside the SDK, alongside the contract corpus and the advisory matching/visualization tooling (match, integration_review, render_match).
Realtime control
The default loop is lockstep — the sim waits for each action. The realtime path lives in the experimental scaffolding (demos/experimental, outside the published SDK), built on top of the SDK’s RobotBridge / RobotAgent. RealtimeRobotBridge (experimental.env) decouples the sim clock from inference: it advances at control_hz on its own wall clock, popping actions from an injected ActionProvider while the agent streams whole action chunks asynchronously. Providers implement the merge strategy — sync (blocking baseline), naive_async (drop-and-replace), weighted_async (blended overlap), and rtc (real-time chunking with an execution horizon) — via make_action_provider(mode, ...). On underrun the sim HOLDs (no_op_action) rather than freezing, because the real world doesn’t pause for inference.
On the agent side, RealtimeRobotAgent (experimental.agent) is the chunk-streaming counterpart: it reads the inference mode/threshold from the contract and replies with whole chunks via RobotClient.send_chunk.
SimRunner selects which thread runs the (usually thread-affine) simulator: InlineSimRunner (event loop thread, the default) or ThreadSimRunner (dedicated worker — render-heavy sims). Subclass it for exotic topologies (e.g. a sim that owns main with the server on a worker).
Telemetry
Zero-config: with HUD telemetry configured,RobotAgent streams one span per step — every camera frame the policy saw plus the executed action — and stamps keyframes where a fresh action chunk was inferred. The platform’s trace viewer plays the episode back: scrub through all frames, with markers at each chunk-prediction decision point.
API summary
| Symbol | Where | Role |
|---|---|---|
RobotEndpoint.capability(contract=...) | hud.environment.robot | Build the openpi/0 capability after start() |
Capability.robot(name, url, contract) | hud.capabilities | Lower-level constructor (usually via endpoint.capability) |
RobotClient | hud.capabilities.robot | Agent-side wire client (spaces, get_observation, send_action, send_chunk) |
RobotBridge | hud.environment.robot | Env-side serve loop; subclass with your sim |
RealtimeRobotBridge | experimental.env (demos/experimental) | Free-running realtime env-side bridge |
RobotEndpoint | hud.environment.robot | Episode bookkeeping + results |
ActionProvider, make_action_provider | experimental.env (demos/experimental) | Realtime chunk-merge strategies |
SimRunner (Inline/Thread) | hud.environment.robot | Which thread runs the sim |
RobotAgent | hud.agents.robot | The episode-loop harness |
RealtimeRobotAgent | experimental.agent (demos/experimental) | Chunk-streaming realtime agent harness |
Model / LeRobotModel, Adapter / LeRobotAdapter | hud.agents.robot | Policy + space-translation seams |
See also
Robot benchmark cookbook
LIBERO in Docker, driven by pi0.5, end to end.