Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.hud.ai/llms.txt

Use this file to discover all available pages before exploring further.

May 6, 2026

Models, Tasksets, Templates & Sharing

Platform

  • Models directory refresh/models is a single unified list with Private and Trainable filters and a live usage column on every row.
  • Taskset analytics tab — dedicated analytics view on tasksets with charts and richer summaries.
  • Multi-environment taskset selection — pick multiple environments at once when configuring a taskset run.
  • Run from suggested tasksets — kick off an evaluation from a model’s suggested-taskset row with the model already locked in.
  • Templates and workflow orchestration — templates settings page and a right-click workflow entry point for repeatable runs.
  • Resource sharing — invite users or whole teams to traces, jobs, evalsets, models, registry items, and collections with a unified accept flow.
  • Trace grader info — evaluation cards on traces show the grader that produced each result.
March 16, 2026

A2A Chat, Citations, GPT-5 & CLI Sync

  • A2A chat orchestrator — agent-to-agent communication for multi-agent workflows with input handling and follow-up turns.
  • hud sync tasks — sync task definitions from Python files or directories to the platform.
  • hud sync env — sync local environment configs with collision detection (replaces hud link).
  • hud eval accepts Python files — run evaluations directly from .py files and directories containing Task objects.
  • Chat class — manage multi-turn agent conversations from a single SDK abstraction.
  • GPT-5 supportResponseAgent defaults to gpt-5, with ToolSearch tool support.
  • Citations — citation support for Claude, Gemini, and OpenAI responses in chat and agent traces.

Platform

  • Click & scroll coordinate overlays — computer use traces render click coordinates and scroll actions directly on screenshots.
  • Trace-level QA workflows — run QA workflows across all tasks from the trace table, with screenshot input and per-task status.
  • Evalset environment filtering — filter results by environment version, with an earliest-version-only toggle.
  • EvaluationResult info viewer — inspect the full info field of evaluation results directly in the UI.
  • Individual user spend — usage page shows per-user spend alongside team totals.
  • Inline job renaming — rename jobs directly from the jobs page.
  • Modal integration — run environments on Modal compute infrastructure.
  • Resources section — new /resources page with published articles.
February 16, 2026

Opus 4.6 Computer Use, Streaming & Deploy Improvements

  • Opus 4.6 computer tool — native support for Claude Opus 4.6 computer use with zoom and screenshot gating.
  • Fine-grained tool streaming — opt-in streaming for individual tool results during agent execution.
  • hud deploy build args & secrets — pass build arguments and secrets to environment container builds.
  • allowed_tools in @env.scenario — scope tool access per evaluation scenario via the decorator.
  • Checkpoint configs — configure checkpoint behavior for long-running evaluations.

Platform

  • Billing refactor — auto top-up, redesigned billing page, and per-key pricing for HUD-managed API keys.
  • Trace viewer enhancements — strip review mode, inline run switching, and file attachment display.
  • Trace comments — add and edit comments on individual traces, with a dedicated column in taskset view.
  • Training jobs dashboard — dedicated section for RL training jobs with detail pages.
  • Native binarization toggle — pass/fail binarization for taskset evaluations, built into the platform.
  • Column ordering — reorder columns in the taskset table view.
  • Model & environment sorting — sort taskset results by model, environment, and environment version.
January 12, 2026

CLI Refinements & Leaderboard Redesign

  • Build args for hud deploy — pass custom build arguments to environment container builds.
  • Wildcard tools — environments can expose * to allow all tools without explicit registration.
  • CLI mode distinctionhud build and hud analyze distinguish between HTTP and stdio modes.

Platform

  • Leaderboard redesign — redesigned leaderboards with publishing flow, public visibility, and embedding support.
  • Slack bot — Slack integration for job notifications and external integration providers.
  • Trace compact view — compact trace view with column reorder, inline comments, and truncated task names.
  • BYOK API keys — bring-your-own-key support with a use_hud_key option for user-managed API keys.
  • Per-key pricing — individual pricing tiers for HUD-managed API keys.
  • Jobs page improvements — compact job list view and refreshed stats.
December 17, 2025

v0.5.0: MCP-First Architecture

  • Environments decoupled — environment definitions moved to separate repos, enabling independent versioning and community contributions.
  • Unified scenario/tool/prompt/resource handling — single abstraction layer for MCP servers and client-side tools, with caching and hot-reload.
  • Telemetry — trace IDs, subagent spans, and structured logging for agent runs.
  • Scenario decorator@env.scenario for defining evaluation scenarios with typed configuration.
  • RL training — initial support for reinforcement learning training via the CLI.

Platform

  • Inference API usage tracking — track inference API usage on the usage page.
  • HUD-managed API keys — platform-side API key management with set api_key support.
October 1, 2025

Bedrock, Gemini & Expanded Model Support

  • AWS Bedrockhud-python[bedrock] extra for running Claude agents via AWS Bedrock.
  • Gemini CUA — Gemini computer use agent support with checkpoint management.
  • Qwen computer tool — QwenComputerTool for Qwen-series models.
  • MCP server support — use HUD environments as MCP servers, integrating with any MCP-compatible client.
  • Telemetry tracing — structured telemetry for agent runs with trace export.

Platform

  • Text trace viewer — view text-only agent traces with a dedicated viewer.
  • Leaderboard embeds — embed leaderboards in external pages.
  • Versioned models — unified evalsets and leaderboards with versioned model support.
  • Usage tracking & billing — usage analytics and subscription management.
August 23, 2025

CLI & Claude Agent

  • hud CLI — full CLI for the development lifecycle: init, dev, build, deploy, eval, analyze, debug.
  • Claude agent with prompt caching — built-in Claude agent with reduced latency and cost.
  • Pre-filtered tools — agents receive only the tools relevant to their current scenario.
  • User-provided system prompts — custom system prompts for tasksets and individual tasks.

Platform

  • Trace viewer — full trace exploration UI with step-by-step replay of agent actions and screenshots.
  • Leaderboards & scorecards — evalset leaderboards with scorecard breakdowns.
  • Jobs & runs display — view agent runs with step-by-step screenshots and action metadata.
  • Public trace sharing — publish and share individual traces publicly.
April 18, 2025

Environment Controllers & Docker Support

  • Client-side environment management — local Docker-based environment execution with copy-to/from support.
  • Claude adapter — built-in adapter for Anthropic Claude computer use and Operator.
  • Gymnasium wrappergym.make() compatibility for RL-style agent training loops.
  • Evaluator framework — pluggable evaluators with structured logging and result export.

Platform

  • Platform launch — dashboard at hud.ai with authentication and evalset browsing.
  • API keys management — create and manage API keys from the dashboard.
  • Profile & team pages — user profiles with team membership and settings.
March 3, 2025

Initial Release

  • Open-source SDKpip install hud-python for AI agent evaluation and RL environments.
  • Core primitives — environments, tasks, evaluators, and runs as first-class objects.
  • Computer use actions — keyboard, mouse, scroll, keyup/keydown, and hold-key actions for desktop environments.