Run
The live handle for one task - the lifecycle plus the agent’s Trace. You get them in job.runs from
task.run(agent) / taskset.run(agent), or construct one over a connected client for manual driving.
| Member | Type | Description |
|---|---|---|
run.prompt | str | list | None | The task’s opening prompt as tasks.start returned it (text, or chat-style message list). |
run.prompt_messages | list[PromptMessage] | The prompt as normalized user/assistant turns - what agents consume. |
run.prompt_text | str | The prompt flattened to plain text, for string-only backends. |
run.trace | Trace | The trajectory the agent fills. The answer is run.trace.content. |
run.grade | Grade | Structured grade result. |
run.reward | float | The graded reward (grade.reward, set on exit). |
run.evaluation | dict | The raw grade payload (grade.raw). |
run.runtime | str | None | Control-channel url the run executed against (the placement record). |
run.trace_id | str | None | Keys the trajectory for training. |
run.job_id / run.group_id | str | None | Batch + GRPO group, set by the runner. |
run.trace.
Grade
Structured result from grading one run, parsed from the wire grade frame
({"score": ..., "done": ..., "isError": ..., ...}).
| Field | Type | Description |
|---|---|---|
reward | float | The frame’s score. |
done | bool | Whether the task is complete. |
content | str | None | Human-readable grade content. |
info | dict | Extra metadata. |
is_error | bool | Whether grading failed. |
raw | dict | The full original frame. |
Trace
The agent’s trajectory for one rollout - an ordered collection of Steps plus the run summary, and the
unit of training data. Every recorded step also streams to the platform as one schema-tagged span.
| Field | Type | Description |
|---|---|---|
steps | list[Step] | The ordered trajectory. |
status | "completed" | "error" | "cancelled" | None | How the run ended (trace.is_error reads it). |
content | str | None | The final answer (graded). |
trace_id | str | None | Keys server-side trajectories. |
hud.types.Step is the shared skeleton (source, timing, error, plus the harness payloads: prompt
messages and task_call lifecycle RPCs). The tool-agent family subclasses it in hud.agents.types,
flat on the skeleton:
AgentStep- the model’s turn:content,reasoning,tool_calls,done, plusmodel,usage, and token-levelsamplewhen the backend is trainable.ToolStep- one tool round-trip: theMCPToolCallpaired with itsMCPToolResult.SubagentStep- a nested rollout’sTrace, embedded whole.
trace.final(get) (newest non-None answer
wins; trace.error is a view on it) and trace.collect(get) (every answer, in step order). Family
vocabulary stays at the call site:
Answer & result types
Answer[T]
When a task declares returns=T, the answer arrives wrapped
(from hud.environment import Answer): content is the answer parsed into T (or the original string
when parsing failed - grade it accordingly), raw is always the string as submitted.
Citation
A normalized citation across providers (hud.agents.types.Citation): type, text, source,
title, start_index, end_index. A reply annotation, not a grading input - provider agents attach
them to AgentStep.citations, and chat surfaces read the final reply’s via the trace.final(...)
query above. A task that wants to grade sources should declare them in its returns= schema so the
agent submits them as part of the answer.
Grading shapes
SubScore and EvaluationResult live with the graders - see
Graders.
Typed task I/O
Declareinput= / returns= on @env.template to surface JSON schemas in the manifest and parse the
agent’s answer into a typed Answer[T]. Any Pydantic model or standard type works.