What stays compatible
Environments are mostly backwards compatible. The v6 SDK still runs environments written against the v5 surface:@env.scenario, @env.tool / env.add_tool, env("scenario"), and env.run(...) all keep working - each emits a DeprecationWarning and adapts to v6 under the hood. New (v6) agents can evaluate your existing environments unchanged.
So you can upgrade the SDK first and keep your environments as-is, then convert at your own pace. Converting is worth it: the v6 spec removes most of the tool-wiring boilerplate.
At a glance
| v5 | v6 | Notes |
|---|---|---|
Environment("name") | Environment(name="name", capabilities=[...]) | positional name still works; declare capabilities up front |
@env.scenario("count") | @env.template() | same yield prompt then yield reward generator |
@env.tool / env.add_tool(ComputerTool()) | a capability (ssh / mcp / cdp / rfb / robot) | the agent’s harness brings the tools now |
env("count", word=...) | count(word=...) | keep the @env.template return value; calling it builds a Task |
task.run("claude") / hud.eval(task) | await task.run(agent) | or just hud eval tasks.py claude |
env.run(transport=...) | await env.serve() / hud serve / hud deploy | v6 serves a control channel, not MCP |
.slug, .columns on a task | .slug, .columns on the Task | unchanged |
hud init, hud deploy, hud eval, and hud sync tasks all carry over. hud dev is now hud serve (the old name remains as a deprecated alias).
Walk through a conversion
Here’s a small v5 coding environment - a couple of tools and one scenario:env.py (v5)
Replace tools with a capability
This is the biggest change. In v5 you registered tools and the environment forwarded them, translating per provider. In v6 you declare a capability - a connection - and the agent’s harness attaches its own tools to it. Shell and file tools become a Other tool kinds map the same way: a browser becomes
Workspace: the environment starts the sandboxed workspace and publishes its ssh capability when it serves:env.py (v6)
cdp, full computer-use becomes rfb, a robot becomes robot, and any custom MCP tools become an mcp capability via Capability.mcp(name=..., url=...). You no longer hand-wire ComputerTool() / BashTool() or call env.as_claude_tools() - the harness does that.Rename @env.scenario to @env.template
The generator body is identical -
yield a prompt, receive the answer, yield a reward. Just swap the decorator and keep a reference to the returned Task:env.py (v6)
@env.template() also accepts id=, description=, and optional input= / returns= types (surfaced as JSON schemas in the manifest). The decorated function is a template that mints Task rows when called. The v5 scenario options (chat, returns, exclude_tools, …) still parse through the compatibility layer if you keep @env.scenario.Build tasks by calling the task function
env("fix-tests", target="tests/") becomes a direct call on the task function. It returns a Task - the runnable unit - and .slug / .columns work exactly as before:tasks.py (v6)
Run it
Locally, Programmatically, the
hud eval is unchanged:hud.eval(task) context manager and task.run(model) are replaced by handing an agent to the task - it returns a Job holding the graded runs:create_agent routes any model (claude-..., gpt-..., gemini-..., grok-...) through the HUD gateway and wires the tools for whichever capabilities the environment exposes.Converting with an agent
The conversion is mechanical, so the fastest path is to let your coding agent do it. Add the HUD docs to your agent - they’re available as an MCP server atdocs.hud.ai/mcp, or use the Copy / Claude / ChatGPT buttons at the top of any docs page - then point it at this guide and the Environment reference and ask it to adapt your env.py. A prompt like:
Convert this v5 HUD environment to v6 using the migration guide at docs.hud.ai. Rename scenarios to tasks, replace registered tools with the capability they imply (shell/files →Because every old import still resolves (the SDK ships shims) and registered tools are auto-promoted to capabilities at serve time, your environment keeps running throughout - convert incrementally and let thessh, browser →cdp, computer-use →rfb, custom tools →mcp), switchenv("name", ...)to calling the task, and fix thehud.toolsimports below.
DeprecationWarnings tell you what’s left.
Imports to update
In v6,hud.tools was removed entirely - tools are capabilities now - but every old import still resolves with a DeprecationWarning:
| v5 import | What it resolves to now | What to do |
|---|---|---|
AgentTool, BaseTool | removed - resolve to a no-op stand-in | drop the class; expose a sub-agent as a plain function on a FastMCP server and attach it as an mcp capability - see Subagents as tools |
Result types: EvaluationResult, ScenarioResult, SubScore, AgentAnswer, Citation | redirected to their v6 homes: hud.graders (ScenarioResult is now EvaluationResult), hud.environment (AgentAnswer is now Answer, without citations), hud.agents.types | change the import to the module the warning names |
ContentResult | supported in v6 at hud.agents.types | from hud.agents.types import ContentResult - .to_content_blocks() builds a tool’s list[ContentBlock] from text + an optional image |
ToolError | removed (no v6 counterpart) | return an error result (ContentResult(error=...).to_content_blocks()) or raise an ordinary exception - the loop surfaces it to the agent and continues |
hud.server.MCPServer | removed - now plain fastmcp.FastMCP (a deprecation shim keeps the old import working and warns) | from fastmcp import FastMCP (same @server.tool / run_async); manage its lifecycle with @env.initialize / @env.shutdown |
Shell/edit tools: BashTool, EditTool, ShellTool, ApplyPatchTool, … | removed - resolve to a marker that synthesizes an ssh capability at serve | call env.workspace(root) instead |
Computer tools: HudComputerTool, AnthropicComputerTool, OpenAIComputerTool, GeminiComputerTool, QwenComputerTool, … | removed - resolve to a marker that synthesizes an rfb capability at serve | declare an rfb (computer-use) or cdp (browser) capability instead |
Anything else under hud.tools: PlaywrightTool, JupyterTool, MemoryTool, filesystem tools, executors, SubmitTool, BaseHub | no-op stand-in (silently does nothing) | remove it - declare a capability (cdp for browser) or serve your own tool over mcp |
Graders: hud.native (BashGrader, LLMJudgeGrader, exact_match, …) | aliased to hud.graders | change the import to from hud.graders import ... |
Chat: hud.services.Chat | aliased to hud.eval.chat (re-exported as hud.Chat) | change the import to from hud import Chat |
hud.services.ChatService | removed - the A2A executor left the SDK | copy the reference server in cookbooks/a2a-chat/server.py (a thin A2A adapter over Chat) |
hud.shared.* (exceptions, requests, …) | merged into hud.utils (no alias - no environment imported it) | change the import to from hud.utils... import ... |
hud.graders, tools become capabilities, and everything else under hud.tools is going away. When the deprecation log is quiet, the conversion is done.
Next steps
Environment reference
Define capabilities, lifecycle hooks, and tasks.
Tasks & Tasksets
Define tasks, collect tasksets, and grade runs.
Package & deploy
Publish with hud deploy and run at scale.