Skip to main content
An environment is everything an agent can interact with—your APIs, services, databases, wrapped as tools. It also defines how agents are evaluated through scenarios. When you deploy an environment, you’re creating a sandbox that agents can learn from at scale.

Why Environments?

Your production API is a single live instance with shared state—you can’t run 500 tests against it in parallel without causing chaos. Environments spin up fresh for every evaluation: isolated, deterministic, reproducible. Run thousands in parallel, each starting from the exact state you define, each generating training data.

Tools

Start with hud init to scaffold an environment. Works on existing codebases too:
hud init
Every tool is just a function. Decorate it with @env.tool() and agents can call it:
from hud import Environment

env = Environment("my-env")

@env.tool()
async def search(query: str) -> str:
    """Search the knowledge base."""
    return db.search(query)
Before building custom tools, check if HUD’s pre-built tools already provide what you need—computer control, shell execution, file editing, web browsing, and more.
Got a FastAPI app? One line:
env.connect_fastapi(app)
All your routes become tools.

Scenarios

To evaluate an agent, you need two things: what to tell it, and how to score what it did. Scenarios capture both with two yield statements:
@env.scenario("checkout")
async def checkout_flow(product_name: str):
    # Yield the prompt, receive the agent's final answer
    answer = yield f"Add '{product_name}' to cart and complete checkout"
    
    # Score based on environment state and/or the answer
    order_exists = await check_order_status(product_name)
    yield 1.0 if order_exists else 0.0
The agent runs between the yields. First yield sends the prompt and returns the agent’s answer. Second yield checks environment state and returns a reward.

Scenarios as Subagents

The first yield is more than just a prompt—it’s context management mixed with dynamic input from the scenario’s parameters. The parameters become a tool spec that other agents can call. We’ve found that agents train much better within a scenario structure than on standalone random tasks. Scenarios define boundaries: what the agent should focus on, what success looks like, and how to measure it. This structure also makes agents easier to compose—wrap a scenario with AgentTool and an orchestrator can call it as a specialized subagent. See the Ops Diagnostics Cookbook for a complete example of hierarchical agents calling subagent scenarios.

Iterating on Your Environment

Three ways to develop and test your environment:

1. Agent Loop with create_agent

Run a full agent loop locally. This mirrors exactly what happens in remote rollouts:
import hud
from hud import Environment
from hud.agents import create_agent

env = Environment("my-env")

@env.tool()
def count_letter(text: str, letter: str) -> int:
    """Count occurrences of a letter in text."""
    return text.lower().count(letter.lower())

@env.scenario("count")
async def count_scenario(sentence: str, letter: str):
    answer = yield f"How many '{letter}' in '{sentence}'?"
    correct = str(sentence.lower().count(letter.lower()))
    yield 1.0 if correct in answer else 0.0

# Create task and run agent
task = env("count", sentence="Strawberry", letter="r")
agent = create_agent("claude-sonnet-4-5")

async with hud.eval(task) as ctx:
    result = await agent.run(ctx, max_steps=10)

print(f"Reward: {result.reward}")

2. MCP Server with hud dev

Spawn your environment as an MCP server that Cursor, Claude Code, or any MCP client can connect to:
# Python-only (no Docker, hot-reloads current directory)
hud dev env:env

# Docker mode (if Dockerfile exists)
hud dev -w env.py -w tools/    # -w enables hot-reload for those paths
Then in Cursor’s MCP settings:
{
  "my-dev-env": { "url": "http://localhost:8765/mcp" }
}
Now your coding agent can call your tools directly. Edit your environment, save, and changes apply immediately. The env:env syntax is like uvicorn—module:attribute. It tells hud dev to import env.py and run the env object as an MCP server.

3. Custom Agent Loop

Build your own agent loop using the format converters. See Integrations for OpenAI, Anthropic, LangChain, and more:
async with hud.eval(task) as ctx:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls...
    await ctx.submit(response.choices[0].message.content)

Connecting Your Stack

HUD wraps your existing infrastructure:
env.connect_fastapi(app)                                    # FastAPI → tools
env.connect_openapi("https://api.example.com/openapi.json") # OpenAPI spec → tools
env.connect_hub("hud-evals/browser")                        # HUD Hub environments
env.connect_image("my-service:v1")                          # Docker images

What’s Next