Skip to main content
v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Migrate to @env.scenario() for new code.

MCPServer → Environment

Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")

@mcp.tool()
def my_tool(): ...

mcp.run()
# After
from hud import Environment
env = Environment("my-env")

@env.tool()
def my_tool(): ...

env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.

Migrating Tasks: Prompt Passthrough Pattern

The recommended migration uses the prompt passthrough pattern—scenario arguments become the prompt content:
from hud import Environment

env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("web-task")
async def web_task(instruction: str, start_url: str = "https://example.com"):
    """
    The instruction arg passes through directly to the prompt.
    One scenario, infinite test cases.
    """
    # Setup phase (before yield)
    await env.call_tool("navigate", url=start_url)
    
    # Prompt - the instruction IS the prompt
    answer = yield instruction
    
    # Evaluate phase (after yield)
    result = await env.call_tool("check_completion")
    yield 1.0 if result["success"] else 0.0

# Create tasks by passing the actual prompt as an arg
task1 = env("web-task", instruction="Find the contact page and extract the support email")
task2 = env("web-task", instruction="Add a MacBook Pro to cart", start_url="https://store.example.com")
task3 = env("web-task", instruction="Fill out the signup form with test data")
This pattern:
  • Args ARE the prompt: The instruction flows directly through as the agent’s task
  • Enables parametric evaluation: Same scenario, different instructions
  • Replaces hardcoded prompts: Instead of LegacyTask(prompt="..."), pass the prompt as an arg
  • Type-safe: Arguments are validated against the scenario signature

Before/After Comparison

# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
    prompt="Find all products under $50 and add the cheapest to cart",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://shop.example.com"}},
    evaluate_tool={"name": "check_cart", "arguments": {}}
)

# AFTER - Prompt passthrough pattern
@env.scenario("shopping")
async def shopping(task: str, shop_url: str):
    await env.call_tool("navigate", url=shop_url)
    
    answer = yield task  # The task arg IS the prompt
    
    result = await env.call_tool("check_cart")
    yield 1.0 if result["has_items"] else 0.0

# Now create multiple tasks with different instructions
tasks = [
    env("shopping", task="Find all products under $50 and add the cheapest to cart", shop_url="https://shop.example.com"),
    env("shopping", task="Search for 'laptop' and add the first result to cart", shop_url="https://shop.example.com"),
    env("shopping", task="Apply promo code SAVE20 at checkout", shop_url="https://shop.example.com"),
]

The Migration Rule

  • promptscenario arg (passthrough)
  • setup_toolcode before first yield
  • evaluate_toolcode after first yield

Multiple setup_tool Calls

If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
    {"name": "navigate", "arguments": {"url": "..."}},
    {"name": "login", "arguments": {"user": "..."}},
]

# AFTER
@env.scenario("authenticated-task")
async def authenticated_task(instruction: str, username: str):
    await env.call_tool("navigate", url="https://app.example.com")
    await env.call_tool("login", user=username)
    
    answer = yield instruction
    
    result = await env.call_tool("check_completion")
    yield 1.0 if result else 0.0

JSON Task Format (Platform Ready)

For JSON-based task definitions that can be uploaded to the HUD platform, use this format:
{
  "env": {
    "name": "hud-evals/browser"
  },
  "scenario": "web-task",
  "args": {
    "instruction": "Find the contact page and extract the support email",
    "start_url": "https://example.com"
  }
}
This maps directly to the scenario call: env("web-task", instruction="...", start_url="..."). Example: Task set for platform upload
[
  {
    "env": { "name": "hud-ops-diagnostics-sentry" },
    "scenario": "sentry-agent:investigate",
    "args": {
      "issue_id": "PROJ-1234",
      "max_depth": 3
    }
  },
  {
    "env": { "name": "hud-evals/browser" },
    "scenario": "web-task",
    "args": {
      "instruction": "Add a MacBook Pro to cart and proceed to checkout"
    }
  }
]
The args field uses prompt passthrough—the values flow directly into the scenario’s yield statement.

Using with Built-in Agents

Built-in agents work with scenarios:
from hud.agents import ClaudeAgent

agent = ClaudeAgent.create()
result = await agent.run(env("web-task", instruction="Find the pricing page"))

Bring Your Own Agent

v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("shopping", task="Add item to cart", shop_url="https://shop.example.com")) as ctx:
    # Use OpenAI, Anthropic, your own agent—whatever you want
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls, run your agent loop...
    await ctx.submit(response.choices[0].message.content)

print(ctx.reward)

Quick Reference

v4 (deprecated in v0.6.0)v5 (recommended)
LegacyTask(prompt=...)env("scenario", instruction=...) — prompt passthrough
setup_toolCode before first yield in @env.scenario()
evaluate_toolCode after first yield in @env.scenario()
MCPServerEnvironment (drop-in replacement)
JSON with mcp_config + promptJSON with env + scenario + args