Package it: hud deploy
The recommended path. hud deploy builds your environment from its Dockerfile.hud (scaffolded by hud init) on HUD and registers it by the name in your Environment(...) declaration — one step, no local Docker required. Then publish your tasks as a named taskset:
hud deployuploads the build context, builds the image on HUD, streams the build logs, and registers the environment (rebuilding in place if the name already exists).hud sync tasks my-tasksetdiffs your tasks against the remote taskset and uploads only what changed.
--env KEY=VALUE / --env-file .env, --build-arg, and --secret. From the platform UI you then run batches, compare models on the same taskset, and browse every trace.
Pick where it runs: the runtime
In code, where a task runs is a runtime you pass at execution time — the task definition never changes. The sametask.run(agent, runtime=…) call targets any substrate:
run.py
run.py
HUDRuntime() is the natural pair with hud deploy: the platform leases an instance, brings your deployed image up on it, and the SDK drives the env through the runtime tunnel. Use HostedRuntime() when the whole rollout should run remotely on the platform.
Run on your own infra
A runtime is just a function: given a task, start a container somewhere and yield its control-channel URL. That one function is the entire integration surface for any sandbox provider — Daytona, Modal, E2B, Runloop, or your own Kubernetes:run.py
DockerRuntime and LocalRuntime are just the built-in versions of this. Anything that can start your image and hand back a URL plugs in with no change to the environment or the task — that’s what “run anywhere” means concretely.
A self-contained image
For a fully-local artifact with no HUD account, build the image directly from the scaffoldedDockerfile.hud and drive a task with the packaged CLI — docker exec runs the commands inside the container, so nothing needs to be exposed:
hud task start returns the prompt; the agent works; hud task grade returns the reward — no source, no open port (hud task list shows what an image exposes).
Reproducible by construction. Each rollout gets its own fresh container, so results reproduce across runs and machines and one rollout never leaks state into the next. Keep per-task setup in
@env.initialize so every run starts from the same state.GPU environments (e.g. robot sims) take extra
docker run flags through the placement: DockerRuntime(image, run_args=["--gpus", "all"]). For sims with multi-minute boots, prefer one long-lived container reused via Runtime(url) over a fresh DockerRuntime per rollout.Next steps
Run on any model
The agent side: any model or harness drives the same task.
Designing tasks for signal
Compose a taskset that actually trains.
Train on your tasks
Turn the rewards you collected into GRPO advantages.
Harbor interop
Load existing benchmarks straight into the runtime.