HUD Documentation — Evaluations and RL Environments.

Looking for specific tool implementations?This reference covers the tool system architecture and how to build custom tools. For documentation on built-in tools, see the Tools Guide:

Coding Tools — Shell execution, file editing
Filesystem Tools — Read, search, glob, list
Memory Tools — Persistent storage
Computer Tools — Mouse, keyboard, screenshots
Web Tools — Browser automation

How Tools Work

HUD tools are async functions that:

Receive structured input from agents (via MCP or native APIs)
Execute actions against an environment, filesystem, or service
Return ContentBlock lists — standardized MCP output (text, images, etc.)

Agent → Tool Call → BaseTool.__call__() → list[ContentBlock] → Agent

Tools integrate with providers through native specs — when Claude calls bash, it uses Anthropic’s native bash_20250124 API. When OpenAI calls shell, it uses their native format. HUD translates automatically.

BaseTool

All tools inherit from BaseTool. Implement __call__ to define behavior.

from hud.tools import BaseTool
from mcp.types import ContentBlock, TextContent

class MyTool(BaseTool):
    def __init__(self, config: str = "default"):
        super().__init__(
            name="my_tool",
            title="My Tool",
            description="Does something useful",
        )
        self.config = config
    
    async def __call__(self, query: str) -> list[ContentBlock]:
        result = await self._do_work(query)
        return [TextContent(text=result, type="text")]
    
    async def _do_work(self, query: str) -> str:
        return f"Processed: {query} with {self.config}"

Constructor Parameters:

Parameter	Type	Description	Default
`env`	`Any`	Stateful context (executor, browser, etc.)	`None`
`name`	`str`	Tool name for MCP registration	Auto from class
`title`	`str`	Human-readable display name	Auto from class
`description`	`str`	Tool description for agents	Auto from docstring

Properties:

mcp — FastMCP FunctionTool wrapper for server registration
native_specs — Dict mapping AgentType to NativeToolSpec

Registration:

from hud.server import MCPServer

mcp = MCPServer(name="my-env")
mcp.add_tool(MyTool())  # Automatically wraps with .mcp

Native Tool Specs

Tools can declare native API mappings for specific providers. This enables zero-translation tool calls for supported agents.

from hud.tools import BaseTool
from hud.tools.native_types import NativeToolSpec
from hud.types import AgentType

class BashTool(BaseTool):
    native_specs = {
        AgentType.CLAUDE: NativeToolSpec(
            api_type="bash_20250124",
            api_name="bash",
            beta="computer-use-2025-01-24",
            role="shell",
        ),
    }

NativeToolSpec Fields:

Field	Type	Description
`api_type`	`str`	Provider’s tool type identifier
`api_name`	`str`	Provider’s tool name
`beta`	`str \| None`	Required beta header (Anthropic)
`role`	`str \| None`	Logical role for exclusion (`"shell"`, `"editor"`, `"memory"`)
`supported_models`	`list[str] \| None`	Glob patterns for compatible models

Role Exclusion: Tools with the same role are mutually exclusive — you can’t have both BashTool (Claude) and ShellTool (OpenAI) active. When an agent accepts one natively, others with the same role are excluded.

# Both have role="shell" — only one registers natively
env.add_tool(BashTool())    # Claude gets this natively
env.add_tool(ShellTool())   # OpenAI gets this natively

Tool Hooks

Modify tool behavior without subclassing using @tool.before and @tool.after:

from hud.tools import BashTool
from hud.tools.types import ToolError

bash = BashTool()

@bash.before
async def validate(command: str | None = None, **kwargs):
    """Runs before execution. Raise to block, return dict to modify args."""
    if command and "rm -rf /" in command:
        raise ToolError("Blocked dangerous command")
    # Return modified kwargs, or None to proceed unchanged
    return {"command": command.strip()} if command else None

@bash.after
async def audit(command: str | None = None, result=None, **kwargs):
    """Runs after execution. Return to modify result, None to keep original."""
    print(f"Executed: {command}")
    return None  # Keep original result

@tool.before:

Receives all tool arguments as kwargs
Return dict to modify arguments before execution
Return None to proceed unchanged
Raise exception to block execution

@tool.after:

Receives tool arguments plus result= (the return value)
Return modified result to change output
Return None to keep original result

Hooks stack in registration order:

@bash.before
async def first_validation(**kwargs): ...

@bash.before  
async def second_validation(**kwargs): ...  # Runs after first

Common Types

ContentBlock

MCP standard output format. Tools return list[ContentBlock].

from mcp.types import TextContent, ImageContent

# Text output
TextContent(text="Operation complete", type="text")

# Image output  
ImageContent(data="base64_data", mimeType="image/png", type="image")

ContentResult

Helper for building tool outputs with multiple content types:

from hud.tools.types import ContentResult

result = ContentResult(
    output="Success message",
    error="Error details if any",
    base64_image="screenshot_data",
)

# Convert to list[ContentBlock]
blocks = result.to_content_blocks()

# For text-only output (returns list[TextContent])
text_blocks = result.to_text_blocks()

ToolError

Raise to return an error to the agent:

from hud.tools.types import ToolError

async def __call__(self, path: str) -> list[ContentBlock]:
    if not path:
        raise ToolError("path is required")
    # ...

ToolError messages are returned to the agent as text content, not raised as exceptions.

EvaluationResult

For evaluation/scoring tools:

from hud.tools.types import EvaluationResult

result = EvaluationResult(
    reward=0.8,        # Score 0-1
    done=True,         # Task complete?
    content="Details", # Optional explanation
    info={"score": 80} # Metadata
)

BaseHub

Organize related tools into namespaced groups with a dispatcher pattern:

from hud.tools import BaseHub
from hud.tools.types import EvaluationResult

evaluators = BaseHub("evaluate")

@evaluators.tool("text_contains")
async def check_text(text: str, target: str) -> EvaluationResult:
    return EvaluationResult(
        reward=1.0 if target in text else 0.0,
        done=True,
    )

@evaluators.tool("url_matches")
async def check_url(url: str, expected: str) -> EvaluationResult:
    return EvaluationResult(
        reward=1.0 if url == expected else 0.0,
        done=True,
    )

# Mount on server — agents call: evaluate(name="text_contains", ...)
mcp.mount(evaluators)

Key Features:

Internal tools are hidden from MCP clients
Single dispatcher endpoint for all hub tools
Automatic resource catalog generation

AgentTool

Wrap a scenario as a callable tool for hierarchical agent systems:

from hud import Environment
from hud.tools import AgentTool

# Subagent environment
researcher = Environment(name="researcher")

@researcher.scenario("search")
async def search_web(query: str):
    yield f"Search for: {query}"
    # ... agent interaction ...

# Create orchestrator and add subagent as tool
orchestrator = Environment(name="orchestrator")

tool = AgentTool(
    researcher("search"),  # Task template
    model="gpt-4o-mini",
    name="web_search",
    description="Search the web for information",
)
orchestrator.add_tool(tool)

Constructor Parameters:

Parameter	Type	Description
`task`	`Task`	Task template from `env("scenario")`
`model`	`str`	Model for subagent (via gateway)
`agent`	`type[MCPAgent]`	Custom agent class (alternative to model)
`name`	`str`	Tool name for orchestrator
`description`	`str`	Tool description

Eval-Only Parameters: Parameters with | None = None are hidden from the orchestrator but available for evaluation scoring:

@env.scenario("investigate")
async def investigate(
    query: str,                          # Visible to orchestrator
    expected_finding: str | None = None, # Hidden — only for eval
):
    response = yield f"Investigate: {query}"
    if expected_finding:
        yield 1.0 if expected_finding in response else 0.0

Executors

Executors provide platform-specific implementations for computer control tools.

BaseExecutor

Abstract base for all executors:

from hud.tools.executors import BaseExecutor

class MyExecutor(BaseExecutor):
    async def click(self, x: int, y: int, **kwargs) -> None: ...
    async def write(self, text: str, **kwargs) -> None: ...
    async def press(self, keys: list[str]) -> None: ...
    async def screenshot(self) -> bytes: ...
    async def get_screen_size(self) -> tuple[int, int]: ...

Built-in Executors

Executor	Platform	Features
`PyAutoGUIExecutor`	Cross-platform	Real mouse/keyboard, screenshots
`XDOExecutor`	Linux/X11	Native X11, faster on Linux

from hud.tools.executors import PyAutoGUIExecutor, XDOExecutor
from hud.tools import HudComputerTool

# Cross-platform
computer = HudComputerTool(executor=PyAutoGUIExecutor())

# Linux with specific display
computer = HudComputerTool(executor=XDOExecutor(display_num=1))

Callback Functions

Monitor and hook into tool actions:

class MyTool(BaseTool):
    def __init__(self):
        super().__init__(name="my_tool")
        self.add_callback("action_complete", self._on_complete)
    
    async def _on_complete(self, **kwargs) -> None:
        print(f"Action completed: {kwargs}")
    
    async def __call__(self, action: str) -> list[ContentBlock]:
        result = await self._do_action(action)
        await self._trigger_callbacks("action_complete", action=action)
        return result

Callback Methods:

add_callback(event_type: str, callback: Callable)
remove_callback(event_type: str, callback: Callable)
_trigger_callbacks(event_type: str, **kwargs)  # Call from tool methods

Callbacks must be async def.

Complete Example

Custom database tool with validation and logging:

from hud.tools import BaseTool
from hud.tools.types import ContentResult, ToolError
from mcp.types import ContentBlock

class DatabaseTool(BaseTool):
    def __init__(self, connection_string: str):
        super().__init__(
            name="database",
            title="Database Query",
            description="Execute read-only SQL queries",
        )
        self.conn_string = connection_string
        self._conn = None
    
    async def __call__(
        self,
        query: str,
        limit: int = 100,
    ) -> list[ContentBlock]:
        if not query.strip().upper().startswith("SELECT"):
            raise ToolError("Only SELECT queries are allowed")
        
        try:
            conn = await self._get_connection()
            results = await conn.fetch(f"{query} LIMIT {limit}")
            
            return ContentResult(
                output=self._format_results(results)
            ).to_content_blocks()
            
        except Exception as e:
            raise ToolError(f"Query failed: {e}")
    
    async def _get_connection(self):
        if not self._conn:
            import asyncpg
            self._conn = await asyncpg.connect(self.conn_string)
        return self._conn
    
    def _format_results(self, rows: list) -> str:
        if not rows:
            return "No results"
        return "\n".join(str(dict(row)) for row in rows)

# Usage with hooks
db = DatabaseTool("postgresql://...")

@db.before
async def audit_query(query: str = "", **kwargs):
    print(f"Executing: {query[:50]}...")

@db.after  
async def log_result(result=None, **kwargs):
    print(f"Returned {len(result)} blocks")

Get Started

Essentials

Guides

Cookbooks

Advanced

Tools

SDK Reference

CLI Reference

Community

Tools

How Tools Work

BaseTool

Native Tool Specs

Tool Hooks

Common Types

ContentBlock

ContentResult

ToolError

EvaluationResult

BaseHub

AgentTool

Executors

BaseExecutor

Built-in Executors

Callback Functions

Complete Example

See Also

Get Started

Essentials

Guides

Cookbooks

Advanced

Tools

SDK Reference

CLI Reference

Community

​How Tools Work

​BaseTool

​Native Tool Specs

​Tool Hooks

​Common Types

​ContentBlock

​ContentResult

​ToolError

​EvaluationResult

​BaseHub

​AgentTool

​Executors

​BaseExecutor

​Built-in Executors

​Callback Functions

​Complete Example

​See Also

How Tools Work

BaseTool

Native Tool Specs

Tool Hooks

Common Types

ContentBlock

ContentResult

ToolError

EvaluationResult

BaseHub

AgentTool

Executors

BaseExecutor

Built-in Executors

Callback Functions

Complete Example

See Also