hud_api_key) and an entity arg (trace_id for per-trace, task_id for per-task). The platform fills these at runtime — you configure the analysis prompt and attach the workflow to a taskset column.
Standard QA Workflows
Four pre-built workflows are available out of the box. These appear under the Standard QA Workflows section on the Agents page and can be attached to any taskset with one click.| Workflow | What it detects | Output |
|---|---|---|
| False Negative | Agent succeeded but grader scored it wrong | is_false_negative, reasoning, confidence |
| False Positive | Agent got credit without genuinely solving | is_false_positive, reasoning, confidence |
| Failure Analysis | Root cause classification (10 categories) | failure_category, root_cause, failed_criteria |
| Reward Hacking | Agent gamed the evaluation mechanism | is_reward_hacking, hacking_strategy, severity |
How to Use
From the Agents Page
- Go to the Agents page
- Under Standard QA Workflows, click a recommended workflow to view it
- Click Add as Column to attach it to any taskset
- Every completed trace is automatically analyzed
- To create your own, click New Agent → QA Workflow, select a workflow scenario, configure the analysis prompt, and choose a model. It appears under Your QA Workflows.
From a Taskset
- Open any taskset → Add Column → QA Workflow
- Pick a recommended workflow or one you’ve created
- Results appear as columns in your trace grid
Building Your Own
A QA workflow is just a scenario withtrace_id + hud_api_key arguments. Use prepare_qa_context from trace-explorer for the common setup:
ground_truth parameter lets you build eval datasets for the workflow itself.
See Also
- Automations — Run scenarios repeatably with pre-filled arguments
- Chat Agents — Multi-turn conversational agents
- Source Code — Fork and customize