Skip to main content
v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use Task.from_v4() for quick migration or @env.scenario() for new code.

Good News: Your Code Still Works

Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")

@mcp.tool()
def my_tool(): ...

mcp.run()
# After
from hud import Environment
env = Environment("my-env")

@env.tool()
def my_tool(): ...

env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scripts, connectors, and integrations on top.

Migration Path 1: Quick Conversion with Task.from_v4()

The fastest way to migrate existing v4 code—no changes to task definitions needed:
# BEFORE (deprecated in v0.6.0)
from hud.datasets import LegacyTask

legacy_task = LegacyTask(
    prompt="Navigate to google.com",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {}}
)

# AFTER - One-line conversion
from hud.eval import Task

task = Task.from_v4(legacy_task)  # Converts LegacyTask → Task
# Also works with: Task.from_v4(dict), Task.from_v4(json_string)

# Works the same with agents
agent = ClaudeAgent.create()
result = await agent.run(task)
Task.from_v4() automatically:
  • Runs setup_tool at the start of evaluation
  • Runs evaluate_tool at the end to compute reward
  • Preserves all existing behavior
For new code or when refactoring, migrate setup_tool and evaluate_tool to @env.scenario(). The rule is simple:
  • setup_tool code → before the first yield
  • evaluate_tool code → after the first yield
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
    prompt="What's the current URL?",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {"expected": "google.com"}}
)

# AFTER
from hud import Environment

env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("navigate-google")
async def navigate_google():
    # ===== SETUP SECTION (replaces setup_tool) =====
    await env.call_tool("navigate", url="https://google.com")
    
    # ===== PROMPT (first yield) =====
    answer = yield "What's the current URL?"
    
    # ===== EVALUATE SECTION (replaces evaluate_tool) =====
    result = await env.call_tool("check_url", expected="google.com")
    
    # ===== REWARD (second yield) =====
    yield 1.0 if result else 0.0

# Create task from scenario
task = env("navigate-google")

Multiple setup_tool Calls

If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
    {"name": "navigate", "arguments": {"url": "..."}},
    {"name": "login", "arguments": {"user": "..."}},
    {"name": "go_to_page", "arguments": {"page": "settings"}}
]

# AFTER
@env.scenario("settings-test")
async def settings_test():
    # Multiple setup steps - just call them in order
    await env.call_tool("navigate", url="...")
    await env.call_tool("login", user="...")
    await env.call_tool("go_to_page", page="settings")
    
    answer = yield "Verify the settings page loaded correctly"
    
    result = await env.call_tool("check_settings")
    yield 1.0 if result else 0.0

Using with Built-in Agents

Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:
from hud.agents import ClaudeAgent

agent = ClaudeAgent.create()

# Works with Task from scenario
result = await agent.run(env("navigate-google"))

# Works with Task.from_v4() conversion
result = await agent.run(Task.from_v4(legacy_task))

Optional: Bring Your Own Agent

v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("checkout", product="laptop")) as ctx:
    # Use OpenAI, Anthropic, your own agent—whatever you want
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls, run your agent loop...
    await ctx.submit(response.choices[0].message.content)

print(ctx.reward)
The old ClaudeAgent and OperatorAgent still work—even with the new hud.eval() system. But now you’re not locked into a specific agent spec. Pair with the Gateway to use any model through one API.

Quick Reference

v4 (deprecated in v0.6.0)v5
LegacyTask(...)Task.from_v4(...) (quick) or env("scenario", ...) (recommended)
setup_toolCode before first yield in @env.scenario()
evaluate_toolCode after first yield in @env.scenario()
MCPServerEnvironment (drop-in replacement)
agent.run(task)Still works, or use hud.eval() for BYOA