Documentation Index
Fetch the complete documentation index at: https://docs.hud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Computer tools let agents interact with GUIs—click, type, scroll, drag, screenshot. Each provider has its own computer use API. Pick the one that matches your agent.
Quick Reference
| Tool | Agent | Default Resolution |
|---|
AnthropicComputerTool | Claude | 1280×720 |
OpenAIComputerTool | OpenAI / Operator | 1920×1080 |
GeminiComputerTool | Gemini | 1440×900 |
GLMComputerTool | GLM-V | 1024×768 |
HudComputerTool | Any (function calling) | 1280×720 |
For Claude. Uses computer_20250124 native API.
from hud.tools import AnthropicComputerTool
computer = AnthropicComputerTool(
display_width=1280,
display_height=720,
rescale_images=True,
)
Actions: screenshot, click, double_click, mouse_move, left_click_drag, type, key, scroll, wait
# Screenshot
await computer(action="screenshot")
# Click
await computer(action="click", coordinate=[640, 360])
# Type text
await computer(action="type", text="Hello, World!")
# Keyboard shortcut
await computer(action="key", text="ctrl+c")
# Scroll
await computer(action="scroll", coordinate=[640, 360], scroll_direction="down", scroll_amount=3)
For OpenAI and Operator. Uses computer_use_preview native API.
from hud.tools import OpenAIComputerTool
computer = OpenAIComputerTool(
display_width=1920,
display_height=1080,
rescale_images=False,
)
Actions: screenshot, click, double_click, scroll, type, wait, move, keypress, drag
# Click
await computer(type="click", x=500, y=300, button="left")
# Type
await computer(type="type", text="Hello!")
# Key press
await computer(type="keypress", keys=["ctrl", "v"])
# Scroll
await computer(type="scroll", x=500, y=300, scroll_x=0, scroll_y=-100)
# Drag
await computer(type="drag", path=[{"x": 100, "y": 100}, {"x": 300, "y": 300}])
For GeminiAgent with Gemini’s native Computer Use models. Uses normalized
0–999 coordinates and returns screenshots plus URL metadata in Gemini
FunctionResponse parts.
from hud.agents.gemini import GeminiAgent
from hud.tools import GeminiComputerTool
agent = GeminiAgent.create(model="gemini-3-flash-preview")
env.add_tool(GeminiComputerTool())
Supported native models: gemini-2.5-computer-use-preview-10-2025, gemini-3-flash-preview
Actions: open_web_browser, click_at, hover_at, type_text_at, scroll_document, scroll_at, wait_5_seconds, go_back, go_forward, search, navigate, key_combination, drag_and_drop
For GLM-4.6V and later. Uses normalized 0–999 coordinates automatically rescaled to screen pixels.
from hud.tools import GLMComputerTool
computer = GLMComputerTool(width=1024, height=768, rescale_images=True)
Actions: left_click, right_click, middle_click, hover, left_double_click, left_drag, key, type, scroll, screenshot, WAIT, DONE, FAIL
# Click (start_box accepts "[x,y]" string, [x,y] list, or [[x,y]])
await computer(action="left_click", start_box="[500, 300]")
# Type text
await computer(action="type", content="Hello, World!")
# Keyboard shortcut
await computer(action="key", keys="ctrl+c")
# Scroll down 5 steps
await computer(action="scroll", start_box="[500, 300]", direction="down", step=5)
# Drag
await computer(action="left_drag", start_box="[100, 100]", end_box="[400, 400]")
# Task completion / failure signals
await computer(action="DONE")
await computer(action="FAIL")
Generic computer tool for any agent via function calling. Use when you need provider-agnostic control.
from hud.tools import HudComputerTool
computer = HudComputerTool(
platform_type="auto", # "auto", "xdo", or "pyautogui"
width=1280,
height=720,
)
Actions: screenshot, click, write, press, scroll, drag, move, wait
await computer(action="screenshot")
await computer(action="click", x=100, y=200)
await computer(action="write", text="Hello", enter_after=True)
await computer(action="press", keys=["ctrl", "c"])
Executors
Computer tools use executors for the actual system interaction:
| Executor | Platform | Notes |
|---|
PyAutoGUIExecutor | Cross-platform | Default on Windows/macOS |
XDOExecutor | Linux/X11 | Faster, uses xdotool |
from hud.tools import HudComputerTool
from hud.tools.executors import XDOExecutor
executor = XDOExecutor(display_num=1)
computer = HudComputerTool(executor=executor)
Coordinate Scaling
Agent coordinates don’t have to match actual display resolution. The tool handles scaling:
# Agent works in 1280×720
computer = AnthropicComputerTool(display_width=1280, display_height=720)
# Actual display is 2560×1440
# Agent sends (640, 360) → Executor receives (1280, 720)
rescale_images=True resizes screenshots to agent dimensions. rescale_images=False keeps full resolution (coordinates still scale).
Customizing
Log all actions with hooks:
from hud.tools import AnthropicComputerTool
computer = AnthropicComputerTool()
@computer.after
async def log_action(action: str = "", result=None, **kwargs):
print(f"Computer action: {action}")
env.add_tool(computer)
Or subclass for deeper control:
from typing import Any
from mcp.types import ContentBlock
from hud.tools import AnthropicComputerTool
from hud.tools.types import ToolError
class SafeComputerTool(AnthropicComputerTool):
BLOCKED_ACTIONS = ["key"] # Block keyboard shortcuts
async def __call__(self, action: str, **kwargs: Any) -> list[ContentBlock]:
if action in self.BLOCKED_ACTIONS:
raise ToolError(f"Action '{action}' not allowed")
return await super().__call__(action, **kwargs)
→ Grounding Tools — Resolve element descriptions to coordinates