Task is a concrete, runnable data point: an environment plus a task id,
arguments, slug, and metadata. Calling an @env.template() function returns a
Task. A Taskset is a named, ordered collection of tasks.
Authoring Tasks
@env.template() registers an async-generator task on an Environment. The returned
callable is the authoring handle; call it with arguments to create a public
Task.
Task
Task is a Pydantic model — one portable, validated row of data:
| Field | Type | Description |
|---|---|---|
env | str | The name of the environment it belongs to. |
id | str | The task id registered on the environment. |
args | dict | Bound arguments. |
slug | str | None | Stable id for sync/filtering/registry. |
columns | dict | None | Metadata for filtering and leaderboards. |
validation | list[dict] | None | Sync/platform metadata. |
agent_config | dict | None | Per-task agent overrides (e.g. {"max_steps": 50}). Applied during hosted execution. |
Placement: where a task runs
Placement is decided at execution time with theruntime= parameter — a provider.
A provider is called with the task row being placed and brings up one fresh
substrate for it:
| Provider | Description |
|---|---|
LocalRuntime(path) | Serve the row’s env from a local .py source in a child process (the same serving path a container CMD runs). env= pins one explicitly. |
DockerRuntime(image) | docker run a fresh container per rollout from an image whose CMD serves the control channel (the scaffolded Dockerfile.hud). port= (default 8765) is the in-container port; run_args= passes extra docker run flags. The control port is the only one published. |
Runtime(url) | Attach to an already-served control channel (provisioned elsewhere; no lifecycle). |
HUDRuntime() | Lease the environment on HUD infra but keep the agent loop local; the SDK opens a tunnel and drives the remote control channel through a local Runtime (the default when runtime= is omitted). |
HostedRuntime() | Submit the whole rollout to the HUD platform so the agent runs remotely next to the env. |
Running a Task
task.run(agent, runtime=...) executes the task end to end — provision, agent,
grade — and returns a Job holding the graded Runs.
It is the single-task form of Taskset.run() with identical scheduling
semantics (group=, max_concurrent=) and failure isolation (a crashed
rollout comes back as a failed Run inside the job rather than raising).
There are no standalone traces — every run reports under a job:
connect, and the Run lifecycle. Exiting the
Run grades it; this path skips the trace reporting and failure isolation
task.run() provides:
Task Methods
| Method | Description |
|---|---|
task.run(agent, runtime=..., group=..., max_concurrent=...) | Schedule through the rollout engine (single-task Taskset.run); returns a Job. |
task.default_slug() | Stable slug from the task id and, when present, an args hash. |
task.model_dump()
is the portable entry ({"env": name, "id": ..., "args": ...}) and
Task.model_validate(data) rebuilds it — standard Pydantic.
Constructing Rows Directly
When you don’t have the task function in hand (data pipelines, generated tasksets), construct the model — fields and metadata are explicit:Taskset
A named, ordered collection of tasks.
Sources
| Constructor | Description |
|---|---|
Taskset(name, tasks) | Wrap an iterable of Tasks. |
Taskset.from_file(path) | Load .py, directory, .json, or .jsonl sources. |
Taskset.from_module(path) | Load public Task or Taskset objects from Python source. |
Taskset.from_api(name) | Load a platform taskset by name or id. |
taskset.to_file(path) | Write .json or .jsonl (hud sync tasks --export adds CSV). |
Collection Operations
| Operation | Description |
|---|---|
len(taskset) / iter(taskset) | Count / iterate tasks. |
taskset["slug"] | Lookup by slug. |
taskset.filter(slugs) | Keep matching slugs. |
taskset.exclude(slugs) | Drop matching slugs. |
Running
Taskset.run() expands each task group times, acquires a fresh substrate per
rollout from the runtime= provider (called with that rollout’s task row, so one
provider serves a mixed-env taskset), lets agent(run) fill the trace, grades
on exit, and returns a Job.
| Method | Description |
|---|---|
await taskset.run(agent, runtime=None, group=1, max_concurrent=None, job=None) | Run the taskset and return Job (pass an open job to accumulate into it). |
Job
The platform receipt for one execution — there are no standalone traces, so
every run (including a single task.run) reports under a job.
| Member | Type | Description |
|---|---|---|
id | str | HUD job id. |
name | str | Display name. |
runs | list[Run] | Runs in expansion order. |
group | int | Runs per task. |
reward | float | Mean reward across runs. |
await Job.start(name, group=1) | Job | Open a job spanning multiple scheduler calls (a training session); pass it as job= to accumulate. |
Sync
hud.eval.sync.diff() compares local tasks to remote tasks and returns a
SyncPlan.
| Type / method | Description |
|---|---|
SyncPlan.to_create | Local tasks not present remotely. |
SyncPlan.to_update | Local tasks whose signature differs. |
SyncPlan.unchanged | Matching tasks. |
SyncPlan.remote_only | Remote tasks not present locally. |
hud sync tasks to upload a taskset to the platform.