hud.native includes reusable grader helpers for scenarios that want structured scoring without hand-building EvaluationResult objects each time.
Quick Example
Grade.from_subscores(...) returns a normal EvaluationResult, so the result can be yielded directly from a scenario.
Grade
Grade.from_subscores(subscores) combines SubScore values into a single EvaluationResult.
Behavior:
- Positive weights are normalized to sum to
1.0 - Negative weights are preserved as penalties
- Duplicate subscore names are de-duplicated
- Per-subscore metadata is copied into
EvaluationResult.info
Grader
Grader is the base class for reusable scoring helpers. Subclasses implement compute_score(...), and grade(...) packages the result as a SubScore.
grade(...) also records JSON-safe copies of the grader parameters in subscore metadata under _parameters.
BashGrader
BashGrader runs a command with /bin/bash -lc and scores it by exit code.
- exit code
0-> score1.0 - non-zero exit code -> score
0.0 - timeout -> score
0.0with timeout metadata - metadata includes
stdout,stderr, andexit_code
Combinators
Grader.any(...) and Grader.all(...) combine multiple subscores into a single summary subscore.
any(...)uses the maximum input scoreall(...)uses the minimum input score