Compose multiple capabilities
An environment can expose several capabilities at once; the harness opens whichever it needs. A task that spans a shell and a browser declares both:env.py
Stateful environments and backing daemons
Use@env.initialize / @env.shutdown to manage anything the tasks need running — a database, a seeded service, a fixture. The hooks run once around serving:
env.py
Parameterize for a difficulty spread
One task definition should span a range. Parameterize the generator and create a concrete task per point:tasks.py
Structure a large taskset across files
Keep tasks in modules and collect them into aTaskset at the top:
tasks.py
hud eval tasks.py claude --full runs the whole set; hud sync tasks my-taskset publishes it. Give each task a stable slug so it’s identifiable on the platform:
tasks.py
Group rollouts for variance
To measure variance (or feed training), run each task several times.group repeats share a GRPO group:
run.py