Documentation Index
Fetch the complete documentation index at: https://docs.hud.ai/llms.txt
Use this file to discover all available pages before exploring further.
May 6, 2026
Models, Tasksets, Templates & Sharing
Platform
- Models directory refresh —
/modelsis a single unified list with Private and Trainable filters and a live usage column on every row. - Taskset analytics tab — dedicated analytics view on tasksets with charts and richer summaries.
- Multi-environment taskset selection — pick multiple environments at once when configuring a taskset run.
- Run from suggested tasksets — kick off an evaluation from a model’s suggested-taskset row with the model already locked in.
- Templates and workflow orchestration — templates settings page and a right-click workflow entry point for repeatable runs.
- Resource sharing — invite users or whole teams to traces, jobs, evalsets, models, registry items, and collections with a unified accept flow.
- Trace grader info — evaluation cards on traces show the grader that produced each result.
March 16, 2026
A2A Chat, Citations, GPT-5 & CLI Sync
- A2A chat orchestrator — agent-to-agent communication for multi-agent workflows with input handling and follow-up turns.
hud sync tasks— sync task definitions from Python files or directories to the platform.hud sync env— sync local environment configs with collision detection (replaceshud link).hud evalaccepts Python files — run evaluations directly from.pyfiles and directories containingTaskobjects.- Chat class — manage multi-turn agent conversations from a single SDK abstraction.
- GPT-5 support —
ResponseAgentdefaults togpt-5, with ToolSearch tool support. - Citations — citation support for Claude, Gemini, and OpenAI responses in chat and agent traces.
Platform
- Click & scroll coordinate overlays — computer use traces render click coordinates and scroll actions directly on screenshots.
- Trace-level QA workflows — run QA workflows across all tasks from the trace table, with screenshot input and per-task status.
- Evalset environment filtering — filter results by environment version, with an earliest-version-only toggle.
- EvaluationResult info viewer — inspect the full
infofield of evaluation results directly in the UI. - Individual user spend — usage page shows per-user spend alongside team totals.
- Inline job renaming — rename jobs directly from the jobs page.
- Modal integration — run environments on Modal compute infrastructure.
- Resources section — new
/resourcespage with published articles.
February 16, 2026
Opus 4.6 Computer Use, Streaming & Deploy Improvements
- Opus 4.6 computer tool — native support for Claude Opus 4.6 computer use with zoom and screenshot gating.
- Fine-grained tool streaming — opt-in streaming for individual tool results during agent execution.
hud deploybuild args & secrets — pass build arguments and secrets to environment container builds.allowed_toolsin@env.scenario— scope tool access per evaluation scenario via the decorator.- Checkpoint configs — configure checkpoint behavior for long-running evaluations.
Platform
- Billing refactor — auto top-up, redesigned billing page, and per-key pricing for HUD-managed API keys.
- Trace viewer enhancements — strip review mode, inline run switching, and file attachment display.
- Trace comments — add and edit comments on individual traces, with a dedicated column in taskset view.
- Training jobs dashboard — dedicated section for RL training jobs with detail pages.
- Native binarization toggle — pass/fail binarization for taskset evaluations, built into the platform.
- Column ordering — reorder columns in the taskset table view.
- Model & environment sorting — sort taskset results by model, environment, and environment version.
January 12, 2026
CLI Refinements & Leaderboard Redesign
- Build args for
hud deploy— pass custom build arguments to environment container builds. - Wildcard tools — environments can expose
*to allow all tools without explicit registration. - CLI mode distinction —
hud buildandhud analyzedistinguish between HTTP and stdio modes.
Platform
- Leaderboard redesign — redesigned leaderboards with publishing flow, public visibility, and embedding support.
- Slack bot — Slack integration for job notifications and external integration providers.
- Trace compact view — compact trace view with column reorder, inline comments, and truncated task names.
- BYOK API keys — bring-your-own-key support with a
use_hud_keyoption for user-managed API keys. - Per-key pricing — individual pricing tiers for HUD-managed API keys.
- Jobs page improvements — compact job list view and refreshed stats.
December 17, 2025
v0.5.0: MCP-First Architecture
- Environments decoupled — environment definitions moved to separate repos, enabling independent versioning and community contributions.
- Unified scenario/tool/prompt/resource handling — single abstraction layer for MCP servers and client-side tools, with caching and hot-reload.
- Telemetry — trace IDs, subagent spans, and structured logging for agent runs.
- Scenario decorator —
@env.scenariofor defining evaluation scenarios with typed configuration. - RL training — initial support for reinforcement learning training via the CLI.
Platform
- Inference API usage tracking — track inference API usage on the usage page.
- HUD-managed API keys — platform-side API key management with
set api_keysupport.
October 1, 2025
Bedrock, Gemini & Expanded Model Support
- AWS Bedrock —
hud-python[bedrock]extra for running Claude agents via AWS Bedrock. - Gemini CUA — Gemini computer use agent support with checkpoint management.
- Qwen computer tool — QwenComputerTool for Qwen-series models.
- MCP server support — use HUD environments as MCP servers, integrating with any MCP-compatible client.
- Telemetry tracing — structured telemetry for agent runs with trace export.
Platform
- Text trace viewer — view text-only agent traces with a dedicated viewer.
- Leaderboard embeds — embed leaderboards in external pages.
- Versioned models — unified evalsets and leaderboards with versioned model support.
- Usage tracking & billing — usage analytics and subscription management.
August 23, 2025
CLI & Claude Agent
hudCLI — full CLI for the development lifecycle:init,dev,build,deploy,eval,analyze,debug.- Claude agent with prompt caching — built-in Claude agent with reduced latency and cost.
- Pre-filtered tools — agents receive only the tools relevant to their current scenario.
- User-provided system prompts — custom system prompts for tasksets and individual tasks.
Platform
- Trace viewer — full trace exploration UI with step-by-step replay of agent actions and screenshots.
- Leaderboards & scorecards — evalset leaderboards with scorecard breakdowns.
- Jobs & runs display — view agent runs with step-by-step screenshots and action metadata.
- Public trace sharing — publish and share individual traces publicly.
April 18, 2025
Environment Controllers & Docker Support
- Client-side environment management — local Docker-based environment execution with copy-to/from support.
- Claude adapter — built-in adapter for Anthropic Claude computer use and Operator.
- Gymnasium wrapper —
gym.make()compatibility for RL-style agent training loops. - Evaluator framework — pluggable evaluators with structured logging and result export.
Platform
- Platform launch — dashboard at hud.ai with authentication and evalset browsing.
- API keys management — create and manage API keys from the dashboard.
- Profile & team pages — user profiles with team membership and settings.
March 3, 2025
Initial Release
- Open-source SDK —
pip install hud-pythonfor AI agent evaluation and RL environments. - Core primitives — environments, tasks, evaluators, and runs as first-class objects.
- Computer use actions — keyboard, mouse, scroll, keyup/keydown, and hold-key actions for desktop environments.