Mock Mode
env.mock() intercepts at the tool layer. Agents only see tools, so this is usually all you need for testing agent logic without hitting real services:
env.mock() for testing.
Testing Scenarios Directly
Scenarios are async generators.hud.eval() drives them automatically, but you can test the grading logic directly:
hud.eval() will behave identically.
Scenario MCP Protocol Mapping
Each scenario registers two MCP endpoints:| Phase | MCP Type | Endpoint | What it does |
|---|---|---|---|
| Setup | Prompt | get_prompt("{env}:{scenario}", args) | Runs code before first yield, returns the prompt |
| Evaluate | Resource | read_resource("{env}:{scenario}") | Runs code after first yield, returns {"reward": float} |
Useful Environment Properties
Common Issues
evaluate_tool: NULL but using v5 scenarios — v5 scenarios return rewards via read_resource, not evaluate_tool. Ensure your orchestrator calls read_resource() after agent completion.
TypeError with complex args like list[dict] — MCP passes all arguments as strings; SDK deserializes them. Add logging to check type(arg) at scenario entry.
Scenario setup works but evaluate returns no reward — submit() wasn’t called before read_resource(). Call await env.submit(scenario_name, answer) first.