hud.eval() is the primary way to run evaluations. It creates an EvalContext with telemetry, handles parallel execution, and integrates with the HUD platform.
hud.eval()
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
source | Task | list[Task] | str | None | Task objects from env(), task slugs, or None | None |
variants | dict[str, Any] | None | A/B test configuration (lists expand to combinations) | None |
group | int | Runs per variant for statistical significance | 1 |
group_ids | list[str] | None | Custom group IDs for parallel runs | None |
job_id | str | None | Job ID to link traces to | None |
api_key | str | None | API key for backend calls | None |
max_concurrent | int | None | Maximum concurrent evaluations | None |
trace | bool | Send telemetry to backend | True |
quiet | bool | Suppress console output | False |
Source Types
Thesource parameter accepts:
Variants
Test multiple configurations in parallel:Groups
Run each variant multiple times for statistical significance:len(evals) × len(variant_combinations) × group
Concurrency Control
EvalContext
EvalContext extends Environment with evaluation tracking.
Properties
| Property | Type | Description |
|---|---|---|
trace_id | str | Unique trace identifier |
eval_name | str | Evaluation name |
prompt | str | None | Task prompt (from scenario or task) |
variants | dict[str, Any] | Current variant assignment |
reward | float | None | Evaluation reward (settable) |
answer | str | None | Submitted answer |
error | BaseException | None | Error if failed |
results | list[EvalContext] | Results from parallel runs |
headers | dict[str, str] | Trace headers for HTTP requests |
job_id | str | None | Parent job ID |
group_id | str | None | Group ID for parallel runs |
index | int | Index in parallel execution |
Methods
AllEnvironment methods are available, plus:
Headers for Telemetry
Inside an eval context, trace headers are automatically injected into HTTP requests:Working with Environments
The recommended pattern is to useasync with env(...) directly:
Results
After parallel runs complete, access results on the context:See Also
- Environments - Environment class reference
- A/B Evals - Variants and groups guide
- Deploy - Running evals at scale
hud evalCLI - Command-line interface