Why Environments?
Your production API is a single live instance with shared state—you can’t run 500 tests against it in parallel without causing chaos. Environments spin up fresh for every evaluation: isolated, deterministic, reproducible. Run thousands in parallel, each starting from the exact state you define, each generating training data.Tools
Start withhud init to scaffold an environment. Works on existing codebases too:
@env.tool() and agents can call it:
Scenarios
To evaluate an agent, you need two things: what to tell it, and how to score what it did. Scenarios capture both with twoyield statements:
Scenarios as Subagents
The first yield is more than just a prompt—it’s context management mixed with dynamic input from the scenario’s parameters. The parameters become a tool spec that other agents can call. We’ve found that agents train much better within a scenario structure than on standalone random tasks. Scenarios define boundaries: what the agent should focus on, what success looks like, and how to measure it. This structure also makes agents easier to compose—wrap a scenario withAgentTool and an orchestrator can call it as a specialized subagent.
See the Ops Diagnostics Cookbook for a complete example of hierarchical agents calling subagent scenarios.
Iterating on Your Environment
Three ways to develop and test your environment:1. Agent Loop with create_agent
Run a full agent loop locally. This mirrors exactly what happens in remote rollouts:2. MCP Server with hud dev
Spawn your environment as an MCP server that Cursor, Claude Code, or any MCP client can connect to:-w), save, and the controller reloads automatically.
The env:env syntax is like uvicorn—module:attribute. It tells hud dev to import env.py and run the env object as an MCP server.
3. Custom Agent Loop
Build your own agent loop using the format converters. See Integrations for OpenAI, Anthropic, LangChain, and more:Chat: Multi-Turn Conversations
Scenarios can also power multi-turn chat agents. Addchat=True and the scenario receives the full conversation history on every turn:
ChatService gives each user an independent session:
Connecting Your Stack
HUD wraps your existing infrastructure:What’s Next
Hosted Running
Deploy your environment to the platform
Sandboxing
Make databases and external services safe for agents
Best Practices
Patterns for reliable environments
Integrations
Connect any agent framework
Chat
Multi-turn agents and A2A serving