hud eval command runs an agent on a tasks file or a HuggingFace dataset.
Usage
Arguments
HuggingFace dataset (e.g.,
hud-evals/SheetBench-50) or task JSON/JSONL file. If omitted, looks for a tasks file in the current directory.Agent backend to use:
claude, openai, or vllm. If omitted, an interactive selector appears (including HUD hosted models).Options
Run the entire dataset (omit for single-task debug mode)
Model name for the chosen agent (required for some agents)
Comma-separated list of allowed tools
Maximum concurrent tasks (1-200 recommended). Adjust based on your API rate limits and system resources.
Maximum steps per task (default: 10 for single task, 50 for full dataset)
Enable verbose agent output
Enable debug-level logs for maximum visibility
Base URL for vLLM server (when using
--agent vllm or HUD hosted models)Number of times to run each task (mini-batch style)
Examples
Notes
- If you select a HUD hosted model,
hud evalwill route through vLLM with the appropriate base model. - When
SOURCEis omitted, an interactive file picker helps locate a tasks file.
See Also
Pricing & Billing
See hosted vLLM and training GPU rates in the Training Quickstart → Pricing. Manage usage and billing athttps://hud.ai/project/billing.