Skip to main content
v6 is a leaner spec. The environment is no longer an MCP server that hands tools to the agent - it’s a small control channel that exposes capabilities (connections the agent drives itself) and tasks (prompt then reward). The agent’s harness owns the tools, so the environment side gets noticeably smaller.

What stays compatible

Environments are mostly backwards compatible. The v6 SDK still runs environments written against the v5 surface: @env.scenario, @env.tool / env.add_tool, env("scenario"), and env.run(...) all keep working - each emits a DeprecationWarning and adapts to v6 under the hood. New (v6) agents can evaluate your existing environments unchanged.
The break is on the agent side. v6 serves a new control channel instead of MCP stdio/http, so old (v5) agents cannot run old or new environments - once an environment is served by the v6 SDK (whether authored in the v5 or v6 style), only a v6 client can drive it. Upgrade the side that runs agents to v6.
So you can upgrade the SDK first and keep your environments as-is, then convert at your own pace. Converting is worth it: the v6 spec removes most of the tool-wiring boilerplate.

At a glance

v5v6Notes
Environment("name")Environment(name="name", capabilities=[...])positional name still works; declare capabilities up front
@env.scenario("count")@env.template()same yield prompt then yield reward generator
@env.tool / env.add_tool(ComputerTool())a capability (ssh / mcp / cdp / rfb / robot)the agent’s harness brings the tools now
env("count", word=...)count(word=...)keep the @env.template return value; calling it builds a Task
task.run("claude") / hud.eval(task)await task.run(agent)or just hud eval tasks.py claude
env.run(transport=...)await env.serve() / hud serve / hud deployv6 serves a control channel, not MCP
.slug, .columns on a task.slug, .columns on the Taskunchanged
The CLI you already use is stable: hud init, hud deploy, hud eval, and hud sync tasks all carry over. hud dev is now hud serve (the old name remains as a deprecated alias).

Walk through a conversion

Here’s a small v5 coding environment - a couple of tools and one scenario:
env.py (v5)
from hud import Environment
from hud.tools import BashTool, EditTool
from hud.native import BashGrader

env = Environment("coder")
env.add_tool(BashTool())
env.add_tool(EditTool())

@env.scenario("fix-tests")
async def fix_tests(target: str = "tests/"):
    answer = yield f"Make the tests in {target} pass."
    yield await BashGrader.grade(command=f"pytest {target} -q")
1

Replace tools with a capability

This is the biggest change. In v5 you registered tools and the environment forwarded them, translating per provider. In v6 you declare a capability - a connection - and the agent’s harness attaches its own tools to it. Shell and file tools become a Workspace: the environment starts the sandboxed workspace and publishes its ssh capability when it serves:
env.py (v6)
from hud.environment import Environment

env = Environment(name="coder")
env.workspace("/workspace")
Other tool kinds map the same way: a browser becomes cdp, full computer-use becomes rfb, a robot becomes robot, and any custom MCP tools become an mcp capability via Capability.mcp(name=..., url=...). You no longer hand-wire ComputerTool() / BashTool() or call env.as_claude_tools() - the harness does that.
2

Rename @env.scenario to @env.template

The generator body is identical - yield a prompt, receive the answer, yield a reward. Just swap the decorator and keep a reference to the returned Task:
env.py (v6)
from hud.graders import BashGrader

@env.template()
async def fix_tests(target: str = "tests/"):
    answer = yield f"Make the tests in {target} pass."
    yield await BashGrader.grade(command=f"pytest {target} -q")
@env.template() also accepts id=, description=, and optional input= / returns= types (surfaced as JSON schemas in the manifest). The decorated function is a template that mints Task rows when called. The v5 scenario options (chat, returns, exclude_tools, …) still parse through the compatibility layer if you keep @env.scenario.
3

Build tasks by calling the task function

env("fix-tests", target="tests/") becomes a direct call on the task function. It returns a Task - the runnable unit - and .slug / .columns work exactly as before:
tasks.py (v6)
from env import fix_tests

easy = fix_tests(target="tests/unit")
easy.slug = "fix-unit-tests"
easy.columns = {"suite": "unit"}
4

Run it

Locally, hud eval is unchanged:
hud eval tasks.py claude
Programmatically, the hud.eval(task) context manager and task.run(model) are replaced by handing an agent to the task - it returns a Job holding the graded runs:
from hud.agents import create_agent

agent = create_agent("claude-sonnet-4-5")
job = await fix_tests(target="tests/").run(agent)
print(job.reward)
create_agent routes any model (claude-..., gpt-..., gemini-..., grok-...) through the HUD gateway and wires the tools for whichever capabilities the environment exposes.
5

Serve and deploy

v5 served an MCP server via env.run(transport=...). v6 serves its control channel - use hud serve while iterating and hud deploy to publish (it builds and publishes in one step). await env.serve(host, port) is the in-code equivalent.

Converting with an agent

The conversion is mechanical, so the fastest path is to let your coding agent do it. Add the HUD docs to your agent - they’re available as an MCP server at docs.hud.ai/mcp, or use the Copy / Claude / ChatGPT buttons at the top of any docs page - then point it at this guide and the Environment reference and ask it to adapt your env.py. A prompt like:
Convert this v5 HUD environment to v6 using the migration guide at docs.hud.ai. Rename scenarios to tasks, replace registered tools with the capability they imply (shell/files → ssh, browser → cdp, computer-use → rfb, custom tools → mcp), switch env("name", ...) to calling the task, and fix the hud.tools imports below.
Because every old import still resolves (the SDK ships shims) and registered tools are auto-promoted to capabilities at serve time, your environment keeps running throughout - convert incrementally and let the DeprecationWarnings tell you what’s left.

Imports to update

In v6, hud.tools was removed entirely - tools are capabilities now - but every old import still resolves with a DeprecationWarning:
v5 importWhat it resolves to nowWhat to do
AgentTool, BaseToolremoved - resolve to a no-op stand-indrop the class; expose a sub-agent as a plain function on a FastMCP server and attach it as an mcp capability - see Subagents as tools
Result types: EvaluationResult, ScenarioResult, SubScore, AgentAnswer, Citationredirected to their v6 homes: hud.graders (ScenarioResult is now EvaluationResult), hud.environment (AgentAnswer is now Answer, without citations), hud.agents.typeschange the import to the module the warning names
ContentResultsupported in v6 at hud.agents.typesfrom hud.agents.types import ContentResult - .to_content_blocks() builds a tool’s list[ContentBlock] from text + an optional image
ToolErrorremoved (no v6 counterpart)return an error result (ContentResult(error=...).to_content_blocks()) or raise an ordinary exception - the loop surfaces it to the agent and continues
hud.server.MCPServerremoved - now plain fastmcp.FastMCP (a deprecation shim keeps the old import working and warns)from fastmcp import FastMCP (same @server.tool / run_async); manage its lifecycle with @env.initialize / @env.shutdown
Shell/edit tools: BashTool, EditTool, ShellTool, ApplyPatchTool, …removed - resolve to a marker that synthesizes an ssh capability at servecall env.workspace(root) instead
Computer tools: HudComputerTool, AnthropicComputerTool, OpenAIComputerTool, GeminiComputerTool, QwenComputerTool, …removed - resolve to a marker that synthesizes an rfb capability at servedeclare an rfb (computer-use) or cdp (browser) capability instead
Anything else under hud.tools: PlaywrightTool, JupyterTool, MemoryTool, filesystem tools, executors, SubmitTool, BaseHubno-op stand-in (silently does nothing)remove it - declare a capability (cdp for browser) or serve your own tool over mcp
Graders: hud.native (BashGrader, LLMJudgeGrader, exact_match, …)aliased to hud.graderschange the import to from hud.graders import ...
Chat: hud.services.Chataliased to hud.eval.chat (re-exported as hud.Chat)change the import to from hud import Chat
hud.services.ChatServiceremoved - the A2A executor left the SDKcopy the reference server in cookbooks/a2a-chat/server.py (a thin A2A adapter over Chat)
hud.shared.* (exceptions, requests, …)merged into hud.utils (no alias - no environment imported it)change the import to from hud.utils... import ...
The rule of thumb: grading types move to hud.graders, tools become capabilities, and everything else under hud.tools is going away. When the deprecation log is quiet, the conversion is done.

Next steps

Environment reference

Define capabilities, lifecycle hooks, and tasks.

Tasks & Tasksets

Define tasks, collect tasksets, and grade runs.

Package & deploy

Publish with hud deploy and run at scale.