Quick Start
What Gets Converted
A Harbor task directory:taskset.json that references all tasks across all environments.
How It Works
Environment Grouping
Tasks with identical Dockerfiles are grouped into a single HUD environment. If every task has a unique Dockerfile (common in Terminal-Bench), each gets its own environment.Dockerfile Adaptation
The converter takes the Harbor Dockerfile verbatim and appends a HUD layer:- Installs
uvstandalone (works on any base image — Debian, Ubuntu, Alpine, etc.) - Installs
hud-pythonandopenaias dependencies - Copies task data into
/harbor/tasks/ - Sets the MCP server as the entrypoint
CMD and ENTRYPOINT from the original Dockerfile are commented out and replaced.
Reward Parsing
Harbor test scripts write results to/logs/verifier/. The converter supports both formats:
reward.txt— a single float (1.0for pass,0.0for fail)reward.json—{"reward": 1.0}or just a float
Running Programmatically
You can also run converted tasks from Python using the SDK:Supported Harbor Patterns
| Pattern | Status |
|---|---|
Simple Dockerfiles (FROM + RUN) | Supported |
COPY from local build context | Supported |
| Multi-stage builds | Supported |
ENV, ARG, build scripts | Supported |
CMD / ENTRYPOINT replacement | Supported |
| Tasks without Dockerfile | Supported (fallback image) |
task.toml metadata passthrough | Supported |
docker-compose.yaml (multi-service) | Not yet supported |
Limitations
- Docker Compose: Tasks using
docker-compose.yamlfor multi-service setups are not currently supported (HUD environments are single-container). - Pre-built images: The converter rebuilds from the source Dockerfile rather than using the
docker_imagefield intask.toml. This ensures full reproducibility but takes longer on first deploy.