Skip to main content
Everything that authors tasks — HUD’s own env.py, platform rows, Harbor task dirs — is a frontend that loads into the same primitives (Environment, Task, Taskset). Integrations are loaders, not converters: no codegen roundtrip to run foreign tasks. The Harbor integration lives in the SDK repo at integrations/harbor.py — a recipe built only on the public SDK surface; copy it into your project or run it from a checkout.

Prerequisites

  • A Harbor task directory — each task has task.toml + instruction.md, and usually an environment/ (with a Dockerfile) and tests/.

Load Harbor tasks

load(path) parses a Harbor task dir (or a dataset of them) into a Taskset directly — one row per task dir (id = the dir name), sharing one declarative Environment per distinct environment/ build context:
from integrations.harbor import detect, load

assert detect("./terminal-bench")
taskset = load("./terminal-bench")

for task in taskset:
    print(task.env, task.id)
Like every task row, the result carries no placement. Run it by supplying one — today that means a substrate already serving the control channel (runtime=Runtime(url)); a docker provider that builds and runs each task’s environment/ image is the planned follow-up:
from hud import Runtime

job = await taskset.run(agent, runtime=Runtime("tcp://127.0.0.1:8765"))

Export HUD tasks to Harbor

export(source, out_dir) goes the other way: it turns a HUD task source (a .py file/dir exposing Tasks, or a .json/.jsonl taskset next to its env.py) into self-contained Harbor task folders:
from integrations.harbor import export

created = await export("tasks.py", "harbor_tasks")
harbor_tasks/
└── <slug>/
    ├── task.toml             # Harbor-native config (+ hud_task/hud_args metadata)
    ├── instruction.md        # the materialized prompt + answer-file convention
    ├── environment/          # the env build context + baked HUD entrypoint
    │   ├── Dockerfile
    │   └── hud_entrypoint.sh
    └── tests/test.sh         # grades over the in-container control channel
How the lifecycle maps:
HUDHarbor
serving (python -m hud.environment.server) + task startthe baked image ENTRYPOINT serves the control channel and parks the run
the agent works, writes answer.txtthe agent works in the container
task evaluate (grade)tests/test.sh grades the parked run, writes reward.txt
Only environments whose capabilities are ssh/mcp are exportable (Harbor is shell-centric; rfb/cdp don’t map). The exported task grades over the HUD control channel, so it needs Harbor’s default same-container verifier — don’t set [verifier.environment] in task.toml.

Review, then rely

The mapping is mechanical, so review the result — confirm the prompt reads naturally, the grader scores what the prompt asks for, and there’s no leftover answer leakage (see Designing tasks for signal).

See also

Package & deploy

Tasks & placement

Designing tasks for signal

CLI reference