Environments
An environment is the world an agent lives in. It packages tools (what agents can do) and scenarios (how agents are evaluated) into a single deployable unit.Tools
A tool is a function an agent can call. Decorate any function with@env.tool() and it becomes agent-callable:
@env.tool() functions are the starting point.
Scenario
A scenario defines how an agent is evaluated. It is an async generator with two yields:| Section | Where | Purpose |
|---|---|---|
| Setup | Before first yield | Seed state, navigate to starting point |
| Prompt | The first yield | Tell the agent what to do |
| Scoring | After first yield, ending with second yield | Check results, return reward |
Tasks
A task is a scenario instantiated with specific arguments. It’s what you actually run an agent against:How They Fit Together
- An Environment contains Tools and Scenarios
- A Scenario + arguments = a Task
- Tasks group into Tasksets
- Run a taskset → collect Traces with rewards
- Train a model on successful traces → run again → improve
Running an Agent Against a Task
Thehud.eval() context manager is how you run any agent against a task:
create_agent() is a convenience that picks the right agent class for each model. You can also bring your own agent loop:
What You Don’t Need Yet
HUD has a lot of surface area. Here’s what to skip on day one:| Skip for now | What it is | When you’ll need it |
|---|---|---|
| Chat scenarios | Multi-turn conversational agents | Building chat products |
| AgentTool | Hierarchical sub-agent delegation | Complex multi-agent workflows |
| Pre-built tools | Shell, browser, file editing, etc. | When your tasks need system-level capabilities |
| Framework integrations | LangChain, CrewAI, AutoGen, etc. | When using those frameworks |
| Harbor conversion | Importing external benchmarks | Migrating existing benchmarks |
| Slack integration | Running agents from Slack | Team workflows |
| REST API | Programmatic platform access | Custom integrations |
Next Steps
Quick Start
Install and run your first environment
Environments
Tools, scenarios, and local development
Best Practices
Patterns for reliable environments and evals
Tasks & Training
Run evaluations and train models