v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Migrate to @env.scenario() for new code.
MCPServer → Environment
Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")
@mcp.tool()
def my_tool(): ...
mcp.run()
# After
from hud import Environment
env = Environment("my-env")
@env.tool()
def my_tool(): ...
env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.
Migrating Tasks: Prompt Passthrough Pattern
The recommended migration uses the prompt passthrough pattern—scenario arguments become the prompt content:
from hud import Environment
env = Environment("browser").connect_hub("hud-evals/browser")
@env.scenario("web-task")
async def web_task(instruction: str, start_url: str = "https://example.com"):
"""
The instruction arg passes through directly to the prompt.
One scenario, infinite test cases.
"""
# Setup phase (before yield)
await env.call_tool("navigate", url=start_url)
# Prompt - the instruction IS the prompt
answer = yield instruction
# Evaluate phase (after yield)
result = await env.call_tool("check_completion")
yield 1.0 if result["success"] else 0.0
# Create tasks by passing the actual prompt as an arg
task1 = env("web-task", instruction="Find the contact page and extract the support email")
task2 = env("web-task", instruction="Add a MacBook Pro to cart", start_url="https://store.example.com")
task3 = env("web-task", instruction="Fill out the signup form with test data")
This pattern:
- Args ARE the prompt: The instruction flows directly through as the agent’s task
- Enables parametric evaluation: Same scenario, different instructions
- Replaces hardcoded prompts: Instead of
LegacyTask(prompt="..."), pass the prompt as an arg
- Type-safe: Arguments are validated against the scenario signature
Before/After Comparison
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
prompt="Find all products under $50 and add the cheapest to cart",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://shop.example.com"}},
evaluate_tool={"name": "check_cart", "arguments": {}}
)
# AFTER - Prompt passthrough pattern
@env.scenario("shopping")
async def shopping(task: str, shop_url: str):
await env.call_tool("navigate", url=shop_url)
answer = yield task # The task arg IS the prompt
result = await env.call_tool("check_cart")
yield 1.0 if result["has_items"] else 0.0
# Now create multiple tasks with different instructions
tasks = [
env("shopping", task="Find all products under $50 and add the cheapest to cart", shop_url="https://shop.example.com"),
env("shopping", task="Search for 'laptop' and add the first result to cart", shop_url="https://shop.example.com"),
env("shopping", task="Apply promo code SAVE20 at checkout", shop_url="https://shop.example.com"),
]
The Migration Rule
prompt → scenario arg (passthrough)
setup_tool → code before first yield
evaluate_tool → code after first yield
If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
{"name": "navigate", "arguments": {"url": "..."}},
{"name": "login", "arguments": {"user": "..."}},
]
# AFTER
@env.scenario("authenticated-task")
async def authenticated_task(instruction: str, username: str):
await env.call_tool("navigate", url="https://app.example.com")
await env.call_tool("login", user=username)
answer = yield instruction
result = await env.call_tool("check_completion")
yield 1.0 if result else 0.0
For JSON-based task definitions that can be uploaded to the HUD platform, use this format:
{
"env": {
"name": "hud-evals/browser"
},
"scenario": "web-task",
"args": {
"instruction": "Find the contact page and extract the support email",
"start_url": "https://example.com"
}
}
This maps directly to the scenario call: env("web-task", instruction="...", start_url="...").
Example: Task set for platform upload
[
{
"env": { "name": "hud-ops-diagnostics-sentry" },
"scenario": "sentry-agent:investigate",
"args": {
"issue_id": "PROJ-1234",
"max_depth": 3
}
},
{
"env": { "name": "hud-evals/browser" },
"scenario": "web-task",
"args": {
"instruction": "Add a MacBook Pro to cart and proceed to checkout"
}
}
]
The args field uses prompt passthrough—the values flow directly into the scenario’s yield statement.
Using with Built-in Agents
Built-in agents work with scenarios:
from hud.agents import ClaudeAgent
agent = ClaudeAgent.create()
result = await agent.run(env("web-task", instruction="Find the pricing page"))
Bring Your Own Agent
v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("shopping", task="Add item to cart", shop_url="https://shop.example.com")) as ctx:
# Use OpenAI, Anthropic, your own agent—whatever you want
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": ctx.prompt}],
tools=ctx.as_openai_chat_tools()
)
# Handle tool calls, run your agent loop...
await ctx.submit(response.choices[0].message.content)
print(ctx.reward)
Quick Reference
| v4 (deprecated in v0.6.0) | v5 (recommended) |
|---|
LegacyTask(prompt=...) | env("scenario", instruction=...) — prompt passthrough |
setup_tool | Code before first yield in @env.scenario() |
evaluate_tool | Code after first yield in @env.scenario() |
MCPServer | Environment (drop-in replacement) |
JSON with mcp_config + prompt | JSON with env + scenario + args |