HUD Documentation - Evaluations and RL Environments.

A runtime decides where an environment runs for a rollout. You pass it to task.run / taskset.run at execution time, and the same task and the same env.py run anywhere - only the runtime changes.

from hud.eval import LocalRuntime

await TASKS.run(agent, runtime=LocalRuntime("env.py"))   # serve env.py locally, run here

Under the hood a runtime is just a function: given a task, bring the environment up somewhere and hand back its control-channel URL. The agent loop always runs on your machine and drives that URL; the runtime only decides what’s on the other end.

Built-in runtimes

Runtime	Where the env runs	When to reach for it
`LocalRuntime("env.py")`	A child process from your source	Fastest iteration; local development
`DockerRuntime("my-env")`	A fresh local container per rollout	Reproducibility and parity with production
`ModalRuntime("my-env")`	A fresh Modal sandbox per rollout	Cloud scale, no infra to manage
`DaytonaRuntime("my-env")`	A fresh Daytona sandbox per rollout	Cloud scale on Daytona
`Runtime("tcp://host:8765")`	A substrate you already started	Attaching to a long-lived container or sandbox you own
`HUDRuntime()`	The HUD platform, leased by name	After you deploy (see below); the default when `runtime=` is omitted for platform/file rows

Most runtimes are on the top-level package (from hud import LocalRuntime, DockerRuntime, HUDRuntime, Runtime); ModalRuntime and DaytonaRuntime import from hud.eval.

Omit runtime= and it’s inferred. A taskset minted in-process from one .py source serves that source locally; rows loaded from a file or the platform fall back to HUDRuntime. Pass a runtime explicitly the moment you want something other than that.

The decision: run it yourself or deploy

Every runtime above is one of two choices, and that choice is the only one that really matters: do you keep the environment on infra you bring up, or deploy it to the HUD platform?

Run it yourself. LocalRuntime for development, DockerRuntime for a reproducible container, ModalRuntime / DaytonaRuntime for cloud sandboxes, or Runtime(url) to attach to something you already started. You own the substrate’s lifecycle.
Deploy to the platform. hud deploy builds your environment image and hosts it; then HUDRuntime() leases it per rollout while your agent loop stays local. This is the path to running batches and comparing models from the platform UI, with every trace browsable.

Package & deploy

Deploying builds your environment into an image on HUD and registers it by the name in your Environment(...) declaration - one step, no local Docker:

hud deploy                 # build the image on HUD from Dockerfile.hud, register it by name
hud sync tasks my-taskset  # publish the taskset (uploads only what changed)

hud init scaffolds the Dockerfile.hud. One build packs every task from the definition, so the same image runs unchanged on HUD, on a cloud sandbox, in CI, or on your laptop. After deploying, HUDRuntime() is the natural pair. Pass build config with --env, --build-arg, and --secret. See the CLI reference for the full flag set.

Run on your own infra

A runtime is just a function: given a task, start a container somewhere and yield its control-channel URL. That one function is the whole integration surface for any provider - Modal, E2B, Runloop, your own Kubernetes:

run.py

from contextlib import asynccontextmanager
from hud import Runtime

@asynccontextmanager
async def my_runtime(task):
    sandbox = await start_my_sandbox(image="my-env")   # your infra brings it up
    try:
        yield Runtime(f"tcp://{sandbox.host}:{sandbox.port}")
    finally:
        await sandbox.terminate()                       # ...and tears it down

await TASKS.run(agent, runtime=my_runtime)

DockerRuntime and the rest are just built-in versions of this. Anything that starts your image and hands back a URL plugs in with no change to the environment or the task - that’s what “run anywhere” means concretely. Constructed directly, Runtime(url) yields itself with a no-op lifecycle, since whoever provisioned the substrate owns teardown. Placement can also vary per task: a runtime is called once per rollout with the task row being placed, so one callable can route heavier rows to heavier substrates.

A self-contained image, no HUD account

For a fully local artifact, build straight from the scaffolded Dockerfile.hud and drive a task with the packaged CLI. docker exec runs the commands inside the container, so nothing needs to be exposed:

docker build -f Dockerfile.hud -t my-env .

docker run -d --name run1 my-env
docker exec run1 hud task start fix_bug                # -> the prompt
docker exec run1 hud task grade fix_bug --answer "..." # -> the reward
docker rm -f run1

Reproducible by construction. Each rollout gets its own fresh container, so results reproduce across runs and machines and one rollout never leaks state into the next. Keep per-task setup in @env.initialize so every run starts from the same state.

Runtime arguments

The constructor for each built-in runtime:

`LocalRuntime`

LocalRuntime(path, *, env=None, ready_timeout=120.0)

path - .py file (or directory) that declares the env. The child’s working directory is the source’s directory, so sibling imports and relative data paths resolve.
env - pin a specific env name when the source declares more than one. Defaults to the placed task’s env.
ready_timeout - seconds to wait for the child to start serving.

`DockerRuntime`

DockerRuntime(image=None, *, port=8765, run_args=(), runtime_config=None)

image - image name to run; shorthand for runtime_config.image.
port - port the image’s CMD serves inside the container (the scaffolded Dockerfile.hud serves 8765).
run_args - extra docker run flags, e.g. ["--gpus", "all"] or ["-e", "KEY=VAL"].
runtime_config - a RuntimeConfig (image, resources) for finer control.

`ModalRuntime`

ModalRuntime(image_name=None, *, image=None, command=None, app_name="hud-envs", port=8765, runtime_config=None, env_vars=None)

image_name - published Modal image name (the preferred durable handle), e.g. ModalRuntime("hud-libero-env").
image - an Image to build lazily on first use, as an escape hatch.
command - override the serving command (defaults to the scaffolded hud serve entrypoint).
app_name / port / env_vars - Modal app name, in-sandbox serving port, and extra environment variables.

Requires the modal extra and a configured token.

`DaytonaRuntime`

DaytonaRuntime(snapshot_name=None, *, image=None, command=None, workdir="/app", port=8765, ssh_host="ssh.app.daytona.io", ssh_expires_minutes=1440, runtime_config=None)

snapshot_name - Daytona snapshot to boot from (the durable handle).
image - Dockerfile/registry ref to build the snapshot once if it’s missing. Resources (cpu/memory/gpu) live on the snapshot.
workdir / port - guest working directory and in-sandbox serving port.
ssh_host / ssh_expires_minutes - SSH tunnel settings (Daytona exposes services over an SSH local-forward).

`HUDRuntime`

HUDRuntime(*, run_timeout=3600.0, runtime_url=None)

run_timeout - bound on one rollout end to end, including instance startup.
runtime_url - override the runtime endpoint the tunnel connects to.

The SDK leases your deployed env by name and tunnels to its control channel; the agent loop runs local.

`Runtime`

Runtime(url, params=..., config=...)

url - control-channel address of an already-running substrate (e.g. tcp://host:8765).
params - connection-time data a transport may need (auth token, sandbox id).

RuntimeConfig

RuntimeConfig carries the construction hints a container-based runtime needs: which image, how much hardware, and what timeouts. Set it on the runtime (runtime_config=) or per row on Task.runtime_config; the runtime merges the two and applies what it supports.

from hud.eval import RuntimeConfig, RuntimeResources, RuntimeGPU, RuntimeLimits

RuntimeConfig(
    image="my-env",
    resources=RuntimeResources(cpu=4, memory_mb=8192, gpu=RuntimeGPU(type="A100", count=1)),
    limits=RuntimeLimits(startup_timeout_s=300, run_timeout_s=1800),
)

Field	Description
`image`	Image to run.
`resources`	`RuntimeResources(cpu, memory_mb, gpu=RuntimeGPU(type, count))`.
`limits`	`RuntimeLimits(startup_timeout_s, run_timeout_s)`.

Support differs per runtime: DockerRuntime, ModalRuntime, and DaytonaRuntime accept it (Docker ignores limits; Daytona ignores run_timeout_s and resource overrides when booting from a snapshot). LocalRuntime and HUDRuntime reject a per-task runtime_config.

Agents

The agent that connects to the environment and acts.

Train

Turn the rewards a run collects into a training signal.

CLI reference

hud deploy, hud eval, hud task, and the full flag set.

Tasks & Tasksets

Per-task placement via Task.runtime_config.

Run & deploy

Built-in runtimes

The decision: run it yourself or deploy

Package & deploy

Run on your own infra

A self-contained image, no HUD account

Runtime arguments

`LocalRuntime`

`DockerRuntime`

`ModalRuntime`

`DaytonaRuntime`

`HUDRuntime`

`Runtime`

RuntimeConfig

See also

Agents

Train

CLI reference

Tasks & Tasksets

​Built-in runtimes

​The decision: run it yourself or deploy

​Package & deploy

​Run on your own infra

​A self-contained image, no HUD account

​Runtime arguments

​LocalRuntime

​DockerRuntime

​ModalRuntime

​DaytonaRuntime

​HUDRuntime

​Runtime

​RuntimeConfig

​See also

Agents

Train

CLI reference

Tasks & Tasksets

Built-in runtimes

The decision: run it yourself or deploy

Package & deploy

Run on your own infra

A self-contained image, no HUD account

Runtime arguments

`LocalRuntime`

`DockerRuntime`

`ModalRuntime`

`DaytonaRuntime`

`HUDRuntime`

`Runtime`

RuntimeConfig

See also