Skip to main content
Computer tools let agents interact with GUIs—click, type, scroll, drag, screenshot. Each provider has its own computer use API. Pick the one that matches your agent.

Quick Reference

ToolAgentDefault Resolution
AnthropicComputerToolClaude1280×720
OpenAIComputerToolOpenAI / Operator1920×1080
GeminiComputerToolGemini1280×720
HudComputerToolAny (function calling)1280×720

AnthropicComputerTool

For Claude. Uses computer_20250124 native API.
from hud.tools import AnthropicComputerTool

computer = AnthropicComputerTool(
    display_width=1280,
    display_height=720,
    rescale_images=True,
)
Actions: screenshot, click, double_click, mouse_move, left_click_drag, type, key, scroll, wait
# Screenshot
await computer(action="screenshot")

# Click
await computer(action="click", coordinate=[640, 360])

# Type text
await computer(action="type", text="Hello, World!")

# Keyboard shortcut
await computer(action="key", text="ctrl+c")

# Scroll
await computer(action="scroll", coordinate=[640, 360], scroll_direction="down", scroll_amount=3)

OpenAIComputerTool

For OpenAI and Operator. Uses computer_use_preview native API.
from hud.tools import OpenAIComputerTool

computer = OpenAIComputerTool(
    display_width=1920,
    display_height=1080,
    rescale_images=False,
)
Actions: screenshot, click, double_click, scroll, type, wait, move, keypress, drag
# Click
await computer(type="click", x=500, y=300, button="left")

# Type
await computer(type="type", text="Hello!")

# Key press
await computer(type="keypress", keys=["ctrl", "v"])

# Scroll
await computer(type="scroll", x=500, y=300, scroll_x=0, scroll_y=-100)

# Drag
await computer(type="drag", path=[{"x": 100, "y": 100}, {"x": 300, "y": 300}])

HudComputerTool

Generic computer tool for any agent via function calling. Use when you need provider-agnostic control.
from hud.tools import HudComputerTool

computer = HudComputerTool(
    platform_type="auto",  # "auto", "xdo", or "pyautogui"
    width=1280,
    height=720,
)
Actions: screenshot, click, write, press, scroll, drag, move, wait
await computer(action="screenshot")
await computer(action="click", x=100, y=200)
await computer(action="write", text="Hello", enter_after=True)
await computer(action="press", keys=["ctrl", "c"])

Executors

Computer tools use executors for the actual system interaction:
ExecutorPlatformNotes
PyAutoGUIExecutorCross-platformDefault on Windows/macOS
XDOExecutorLinux/X11Faster, uses xdotool
from hud.tools import HudComputerTool
from hud.tools.executors import XDOExecutor

executor = XDOExecutor(display_num=1)
computer = HudComputerTool(executor=executor)

Coordinate Scaling

Agent coordinates don’t have to match actual display resolution. The tool handles scaling:
# Agent works in 1280×720
computer = AnthropicComputerTool(display_width=1280, display_height=720)

# Actual display is 2560×1440
# Agent sends (640, 360) → Executor receives (1280, 720)
rescale_images=True resizes screenshots to agent dimensions. rescale_images=False keeps full resolution (coordinates still scale).

Customizing

Log all actions with hooks:
from hud.tools import AnthropicComputerTool

computer = AnthropicComputerTool()

@computer.after
async def log_action(action: str = "", result=None, **kwargs):
    print(f"Computer action: {action}")

env.add_tool(computer)
Or subclass for deeper control:
from typing import Any
from mcp.types import ContentBlock
from hud.tools import AnthropicComputerTool
from hud.tools.types import ToolError

class SafeComputerTool(AnthropicComputerTool):
    BLOCKED_ACTIONS = ["key"]  # Block keyboard shortcuts
    
    async def __call__(self, action: str, **kwargs: Any) -> list[ContentBlock]:
        if action in self.BLOCKED_ACTIONS:
            raise ToolError(f"Action '{action}' not allowed")
        return await super().__call__(action, **kwargs)
Grounding Tools — Resolve element descriptions to coordinates