Run
The live handle for one task — the lifecycle plus the agent’s Trace. You get
them in job.runs from task.run(agent) / taskset.run(agent), or construct
one over a connected client for manual driving (see
Running a Task).
| Member | Type | Description |
|---|---|---|
run.prompt | str | list | None | The task’s opening prompt as tasks.start returned it (text, or chat-style message list). |
run.prompt_messages | list[PromptMessage] | The prompt as normalized user/assistant turns — what agents consume. |
run.prompt_text | str | The prompt flattened to plain text, for string-only backends. |
run.trace | Trace | The trajectory the agent fills. The answer is run.trace.content. |
run.grade | Grade | Structured grade result. |
run.reward | float | The graded reward (grade.reward, set on exit). |
run.evaluation | dict | The raw grade payload (grade.raw). |
run.runtime | str | None | Control-channel url the run executed against (the placement record). |
run.trace_id | str | None | Keys the trajectory; satisfies Rewarded. |
run.job_id / run.group_id | str | None | Batch + GRPO group, set by the runner. |
run.trace.
Grade
Structured result from grading one run, parsed from the wire grade frame
({"score": ..., "done": ..., "isError": ..., ...}).
| Field | Type | Description |
|---|---|---|
reward | float | The frame’s score. |
done | bool | Whether the task is complete. |
content | str | None | Human-readable grade content. |
info | dict | Extra metadata. |
is_error | bool | Whether grading failed. |
raw | dict | The full original frame. |
Trace
The agent’s trajectory for one rollout — an ordered collection of Steps plus
the run summary, and the unit of training data. Every recorded step also
streams to the platform as one schema-tagged span.
| Field | Type | Description |
|---|---|---|
steps | list[Step] | The ordered trajectory. |
status | "completed" | "error" | "cancelled" | None | How the run ended (trace.is_error reads it). |
content | str | None | The final answer (graded). |
trace_id | str | None | Keys server-side trajectories. |
hud.types.Step is the shared skeleton (source, timing, error, plus the
harness payloads: prompt messages and task_call lifecycle RPCs). The
tool-agent family subclasses it in hud.agents.types, flat on the skeleton:
AgentStep— the model’s turn:content,reasoning,tool_calls,done, plusmodel,usage, and token-levelsamplewhen the backend is trainable.ToolStep— one tool round-trip: theMCPToolCallpaired with itsMCPToolResult.SubagentStep— a nested rollout’sTrace, embedded whole.
trace.final(get)
(newest non-None answer wins; trace.error is a view on it) and
trace.collect(get) (every answer, in step order). Family vocabulary stays at
the call site:
Answer & result types
Answer[T]
When a task declares returns=T, the answer arrives wrapped
(from hud.environment import Answer): content is the answer parsed into
T (or the original string when parsing failed — grade it accordingly),
raw is always the string as submitted.
Citation
A normalized citation across providers (hud.agents.types.Citation): type, text, source, title, start_index, end_index. A reply annotation, not a grading input — provider agents attach them to AgentStep.citations, and chat surfaces read the final reply’s via the trace.final(...) query above. A task that wants to grade sources should declare them in its returns= schema so the agent submits them as part of the answer.
Grading shapes
SubScore and EvaluationResult live with the graders — see Graders.
Training types
Rewarded— the protocol training needs: anything withtrace_id: str | Noneandreward: float(aRunqualifies).group_relative(rewards, *, normalize_std=True)— GRPO advantages over one group.
Typed task I/O
Declareinput= / returns= on @env.template to surface JSON schemas in the manifest and parse the agent’s answer into a typed Answer[T]. Any Pydantic model or standard type works.