Full interactive API docs with request/response schemas are available at api.hud.so/docs.
Run a Scenario
POST /agent/run triggers a scenario and optionally waits for the result.
Request Fields
| Field | Type | Default | Description |
|---|---|---|---|
env_name | string | required | Environment name |
scenario_name | string | required | Scenario to run (as defined in the environment) |
scenario_args | object | {} | Scenario argument values |
model | string | required | Model name (e.g. gpt-4o, claude-sonnet-4-20250514) |
max_steps | integer | 100 | Maximum agent steps (1-500) |
sync | boolean | true | If true, polls for completion and returns result. If false, returns immediately with trace_id |
timeout | integer | 300 | Timeout in seconds for sync mode (30-600s). Ignored if sync=false |
evalset_id | string | null | Optional evalset ID to create a task in |
callback | object | null | Webhook callback config (see Callbacks) |
Response
sync=false, only trace_id, trace_url, and status ("running") are returned. Use the trace read-back endpoints below to poll for the result.
Read Back Trace Results
After a run completes (or while polling withsync=false), use these endpoints to retrieve trace data.
Get Trace Details
GET /telemetry/traces/{trace_id} returns reward, status, scenario config, and optionally the full trajectory and logs.
| Parameter | Type | Default | Description |
|---|---|---|---|
include_trajectory | boolean | false | Include full trajectory (all agent actions and observations) |
include_logs | boolean | false | Include environment logs (container stdout/stderr) |
include_rollout_logs | boolean | false | Include orchestrator/worker logs |
Get Full Trace Telemetry
GET /telemetry/trace/{trace_id} returns the full trace with signed screenshot URLs and complete trajectory steps.
GET /telemetry/traces/{trace_id} for a lighter read.
Batch Status Check
POST /telemetry/traces/status checks status for multiple traces in a single request. Useful for lightweight polling.
Task Management
Upload Tasks to a Taskset
POST /tasks/upload creates or updates tasks in a taskset (evalset). Tasks are resolved against their environment’s scenarios. If the taskset doesn’t exist, it’s created automatically.
Request Fields
| Field | Type | Description |
|---|---|---|
name | string | Taskset name. Created if it doesn’t exist |
tasks | array | List of tasks to upload |
tasks[].slug | string | null | Stable task identifier (stored as external_id). Used for upsert — if a task with this slug exists, it’s updated. Must not be a 4-digit number (reserved for auto-assigned IDs) |
tasks[].scenario | string | Scenario name to run |
tasks[].args | object | Scenario argument values |
tasks[].env | object | Environment config (required) |
tasks[].env.name | string | Environment name (required for scenario resolution) |
tasks[].env.include | string[] | Tool whitelist (optional) |
tasks[].env.exclude | string[] | Tool blacklist (optional) |
tasks[].agent_config | object | null | Agent behavior overrides |
tasks[].agent_config.system_prompt | string | null | Custom system prompt |
tasks[].validation | array | null | Tool calls representing successful completion (stored in metadata) |
Response
slug in the same taskset are updated instead of duplicated.
Add Tasks by Evalset ID
POST /tasks/evalsets/{evalset_id}/tasks adds tasks to an existing taskset by its UUID. This endpoint uses the internal task format with explicit scenario_id references.
Request Fields
| Field | Type | Description |
|---|---|---|
tasks | array | List of tasks (min 1) |
tasks[].prompt | string | null | Task prompt |
tasks[].tier | integer | Task difficulty tier |
tasks[].scenario_id | string | null | Scenario UUID to run |
tasks[].scenario_args | array | null | [{"name": "...", "value": "..."}] argument values |
tasks[].external_id | string | Task identifier (auto-assigned if omitted) |
tasks[].tags | string[] | Tags for organization |
tasks[].env_config | object | null | Tool include/exclude lists |
tasks[].agent_config | object | null | Agent behavior overrides |
tasks[].metadata | object | null | Arbitrary metadata |
Response
Job Endpoints
If you create runs withskip_job=false, you can query job-level data.
Get Job Telemetry
GET /telemetry/job/{job_id} returns job metrics, all task runs with rewards, timing, and metadata.
List Trace IDs for a Job
GET /telemetry/job/{job_id}/trace-ids returns all trace IDs associated with a job.
Callbacks
Instead of polling, you can configure a webhook callback to receive results when a trace completes.Callback Fields
| Field | Type | Default | Description |
|---|---|---|---|
url | string | required | URL to POST results to |
include_reward | boolean | true | Include reward in payload |
include_response | boolean | true | Include agent’s final response |
include_trajectory | boolean | false | Include full trajectory (can be large) |
retry_count | integer | 3 | Number of retries on failure (0-10) |
headers | object | null | Custom headers to include |
metadata | object | null | Arbitrary metadata echoed back in the callback |
Async Polling Pattern
A common pattern for programmatic use: launch withsync=false, then poll the lightweight status endpoint until completion, then fetch the full result.