System Overview
Core Components
1. Agents (hud.agents)
Agents make decisions and call tools:
Agents can auto-create MCP clients from
task.mcp_config - no manual client setup needed2. Tasks (hud.Task)
Tasks define what agents should accomplish:
The
name and arguments in setup/evaluate tools correspond exactly to the tool names and parameters exposed by the MCP server3. MCP Clients (hud.clients)
Clients handle the MCP protocol:
4. Environments
Environments are MCP servers exposing tools:5. Telemetry (hud.trace)
Real-time observability:
Execution Flow
1
Task Definition
Create a
Task with prompt and MCP configuration2
Agent Initialization
Agent creates MCP client (if needed) and connects to environment
3
Setup Phase
Execute
setup_tool to initialize environment state4
Execution Loop
Agent receives observations, makes decisions, calls tools
5
Evaluation
Execute
evaluate_tool to score performance6
Telemetry
All interactions streamed to HUD backend for analysis
Key Design Principles
- Protocol-First: Everything speaks MCP
- Composable: Mix and match agents, environments, evaluations
- Observable: Built-in telemetry for every interaction
- Testable: Reproducible evaluations with Docker
- Extensible: Easy to add new agents or environments
The
MCPServer class wraps FastMCP with lifecycle management, making it easy to build Docker-based environments