v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use Task.from_v4() for quick migration or @env.scenario() for new code.
Good News: Your Code Still Works
Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")
@mcp.tool()
def my_tool(): ...
mcp.run()
# After
from hud import Environment
env = Environment("my-env")
@env.tool()
def my_tool(): ...
env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scripts, connectors, and integrations on top.
Migration Path 1: Quick Conversion with Task.from_v4()
The fastest way to migrate existing v4 code—no changes to task definitions needed:
# BEFORE (deprecated in v0.6.0)
from hud.datasets import LegacyTask
legacy_task = LegacyTask(
prompt="Navigate to google.com",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
evaluate_tool={"name": "check_url", "arguments": {}}
)
# AFTER - One-line conversion
from hud.eval import Task
task = Task.from_v4(legacy_task) # Converts LegacyTask → Task
# Also works with: Task.from_v4(dict), Task.from_v4(json_string)
# Works the same with agents
agent = ClaudeAgent.create()
result = await agent.run(task)
Task.from_v4() automatically:
- Runs
setup_tool at the start of evaluation
- Runs
evaluate_tool at the end to compute reward
- Preserves all existing behavior
Migration Path 2: Full Scenario Migration (Recommended)
For new code or when refactoring, migrate setup_tool and evaluate_tool to @env.scenario().
The rule is simple:
setup_tool code → before the first yield
evaluate_tool code → after the first yield
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
prompt="What's the current URL?",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
evaluate_tool={"name": "check_url", "arguments": {"expected": "google.com"}}
)
# AFTER
from hud import Environment
env = Environment("browser").connect_hub("hud-evals/browser")
@env.scenario("navigate-google")
async def navigate_google():
# ===== SETUP SECTION (replaces setup_tool) =====
await env.call_tool("navigate", url="https://google.com")
# ===== PROMPT (first yield) =====
answer = yield "What's the current URL?"
# ===== EVALUATE SECTION (replaces evaluate_tool) =====
result = await env.call_tool("check_url", expected="google.com")
# ===== REWARD (second yield) =====
yield 1.0 if result else 0.0
# Create task from scenario
task = env("navigate-google")
If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
{"name": "navigate", "arguments": {"url": "..."}},
{"name": "login", "arguments": {"user": "..."}},
{"name": "go_to_page", "arguments": {"page": "settings"}}
]
# AFTER
@env.scenario("settings-test")
async def settings_test():
# Multiple setup steps - just call them in order
await env.call_tool("navigate", url="...")
await env.call_tool("login", user="...")
await env.call_tool("go_to_page", page="settings")
answer = yield "Verify the settings page loaded correctly"
result = await env.call_tool("check_settings")
yield 1.0 if result else 0.0
Using with Built-in Agents
Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:
from hud.agents import ClaudeAgent
agent = ClaudeAgent.create()
# Works with Task from scenario
result = await agent.run(env("navigate-google"))
# Works with Task.from_v4() conversion
result = await agent.run(Task.from_v4(legacy_task))
Optional: Bring Your Own Agent
v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("checkout", product="laptop")) as ctx:
# Use OpenAI, Anthropic, your own agent—whatever you want
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": ctx.prompt}],
tools=ctx.as_openai_chat_tools()
)
# Handle tool calls, run your agent loop...
await ctx.submit(response.choices[0].message.content)
print(ctx.reward)
The old ClaudeAgent and OperatorAgent still work—even with the new hud.eval() system. But now you’re not locked into a specific agent spec. Pair with the Gateway to use any model through one API.
Quick Reference
| v4 (deprecated in v0.6.0) | v5 |
|---|
LegacyTask(...) | Task.from_v4(...) (quick) or env("scenario", ...) (recommended) |
setup_tool | Code before first yield in @env.scenario() |
evaluate_tool | Code after first yield in @env.scenario() |
MCPServer | Environment (drop-in replacement) |
agent.run(task) | Still works, or use hud.eval() for BYOA |