HUD Documentation - Evaluations and RL Environments.

A runtime chooses where each rollout’s environment runs. You pass it to task.run / taskset.run at execution time, and the same task and the same env.py run anywhere - only the runtime changes.

from hud import LocalRuntime

await taskset.run(agent, runtime=LocalRuntime("env.py"))   # serve env.py locally, run here

Built-in runtimes

Runtime	Where the env runs	When to reach for it
`LocalRuntime("env.py")`	A child process from your source	Fastest iteration; local development
`DockerRuntime("my-env")`	A fresh local container per rollout	Reproducibility and parity with production
`ModalRuntime("my-env")`	A fresh Modal sandbox per rollout	Cloud scale, no infra to manage
`DaytonaRuntime("my-env")`	A fresh Daytona sandbox per rollout	Cloud scale on Daytona
`Runtime("tcp://host:8765")`	A substrate you already started	Attaching to a long-lived container or sandbox you own
`HUDRuntime()`	A HUD-hosted env, leased by name and tunneled	Local agent loop against a deployed env
`HostedRuntime()`	The whole rollout on a HUD-leased box	Agent and env run together off your machine

Most runtimes are on the top-level package (from hud import LocalRuntime, DockerRuntime, HUDRuntime, HostedRuntime, Runtime); ModalRuntime and DaytonaRuntime import from hud.eval.

Omit runtime= and it’s inferred from each task’s _source, the file its template was defined in. When every task shares one _source, that source is served locally as LocalRuntime(source); otherwise (mixed sources, or rows loaded from a file or the platform with no source) it falls back to HUDRuntime(). Pass a runtime explicitly the moment you want something else.

To deploy an environment to the platform and run against it, see running an eval and deploying to the platform.

RuntimeConfig

RuntimeConfig carries the construction hints a container-based runtime needs: which image, how much hardware, and what timeouts. Set it on the runtime (runtime_config=) or per row on Task.runtime_config; the runtime merges the two and applies what it supports.

from hud.eval import RuntimeConfig, RuntimeResources, RuntimeGPU, RuntimeLimits

RuntimeConfig(
    image="my-env",
    resources=RuntimeResources(cpu=4, memory_mb=8192, gpu=RuntimeGPU(type="A100", count=1)),
    limits=RuntimeLimits(startup_timeout_s=300, run_timeout_s=1800),
)

Field	Description
`image`	Image to run.
`resources`	`RuntimeResources(cpu, memory_mb, gpu=RuntimeGPU(type, count))`.
`limits`	`RuntimeLimits(startup_timeout_s, run_timeout_s)`.

Support differs per runtime: DockerRuntime, ModalRuntime, and DaytonaRuntime accept it (Docker ignores limits; Daytona ignores run_timeout_s and resource overrides when booting from a snapshot). LocalRuntime and HUDRuntime reject a per-task runtime_config.

Runtime directory

The constructor for each built-in runtime:

`LocalRuntime`

LocalRuntime(path, *, env=None, ready_timeout=120.0)

path - .py file (or directory) that declares the env. The child’s working directory is the source’s directory, so sibling imports and relative data paths resolve.
env - pin a specific env name when the source declares more than one. Defaults to the placed task’s env.
ready_timeout - seconds to wait for the child to start serving.

`DockerRuntime`

DockerRuntime(image=None, *, port=8765, run_args=(), runtime_config=None)

image - image name to run; shorthand for runtime_config.image.
port - port the image’s CMD serves inside the container (the scaffolded Dockerfile.hud serves 8765).
run_args - extra docker run flags, e.g. ["--gpus", "all"] or ["-e", "KEY=VAL"].
runtime_config - a RuntimeConfig (image, resources) for finer control.

`ModalRuntime`

ModalRuntime(image_name=None, *, image=None, command=None, app_name="hud-envs", workdir=None, port=8765, runtime_config=None, env_vars=None)

image_name - published Modal image name (the preferred durable handle), e.g. ModalRuntime("hud-libero-env").
image - an Image to build lazily on first use, as an escape hatch.
command - override the serving command (defaults to the scaffolded hud serve entrypoint).
workdir - working directory inside the sandbox. Left unset, Modal keeps the image’s WORKDIR.
app_name / port / env_vars - Modal app name, in-sandbox serving port, and extra environment variables.

Requires the modal extra and a configured token.

`DaytonaRuntime`

DaytonaRuntime(snapshot_name=None, *, image=None, command=None, workdir="/app", port=8765, ssh_host="ssh.app.daytona.io", ssh_expires_minutes=1440, runtime_config=None)

snapshot_name - Daytona snapshot to boot from (the durable handle).
image - Dockerfile/registry ref to build the snapshot once if it’s missing. Resources (cpu/memory/gpu) live on the snapshot.
workdir / port - guest working directory and in-sandbox serving port.
ssh_host / ssh_expires_minutes - SSH tunnel settings (Daytona exposes services over an SSH local-forward).

`HUDRuntime`

HUDRuntime(*, run_timeout=3600.0, runtime_url=None)

run_timeout - bound on one rollout end to end, including instance startup.
runtime_url - override the runtime endpoint the tunnel connects to.

The SDK leases your deployed env by name and tunnels to its control channel; the agent loop runs local.

`HostedRuntime`

HostedRuntime(*, poll_interval=5.0, run_timeout=3600.0)

poll_interval - seconds between trace-status polls while the rollout runs remotely.
run_timeout - bound on one rollout end to end, including instance provisioning and queueing.

Where HUDRuntime runs the agent loop locally against a tunneled env, HostedRuntime runs the whole rollout off-box: the platform leases an instance, brings the env’s container up on it, and runs the agent right next to it. This process only submits the rollout and polls its trace to completion. It requires a gateway agent that can serialize its identity (Claude/OpenAI/Gemini).

`Runtime`

Runtime(url, params=..., config=...)

url - control-channel address of an already-running substrate (e.g. tcp://host:8765).
params - connection-time data a transport may need (auth token, sandbox id).

Run on your own infra

A runtime is just a function: given a task, start a container somewhere and yield its control-channel URL. That one function is the whole integration surface for any provider - Modal, E2B, Runloop, your own Kubernetes:

run.py

from contextlib import asynccontextmanager
from hud import Runtime

@asynccontextmanager
async def my_runtime(task):
    sandbox = await start_my_sandbox(image="my-env")   # your infra brings it up
    try:
        yield Runtime(f"tcp://{sandbox.host}:{sandbox.port}")
    finally:
        await sandbox.terminate()                       # ...and tears it down

await taskset.run(agent, runtime=my_runtime)

DockerRuntime and the rest are just built-in versions of this. Anything that starts your image and hands back a URL plugs in with no change to the environment or the task - that’s what “run anywhere” means concretely. Constructed directly, Runtime(url) yields itself with a no-op lifecycle, since whoever provisioned the substrate owns teardown. Placement can also vary per task: a runtime is called once per rollout with the task row being placed, so one callable can route heavier rows to heavier substrates.

​Built-in runtimes

​RuntimeConfig

​Runtime directory

​LocalRuntime

​DockerRuntime

​ModalRuntime

​DaytonaRuntime

​HUDRuntime

​HostedRuntime

​Runtime

​Run on your own infra

Built-in runtimes

RuntimeConfig

Runtime directory

`LocalRuntime`

`DockerRuntime`

`ModalRuntime`

`DaytonaRuntime`

`HUDRuntime`

`HostedRuntime`

`Runtime`

Run on your own infra