How It Works
GroundedComputerTool
Wraps a computer tool to accept element descriptions instead of coordinates.click, double_click, move, scroll, drag, type, keypress, screenshot, wait
Grounder
The engine that locates elements using vision models.With HUD Agents
GroundedComputerTool is typically used as a wrapper around environment computer tools. Register the underlying computer tool, then use grounded calls:
When to Use
Good for:- Dynamic interfaces where elements move
- Natural language task descriptions
- Complex layouts with many similar elements
- Static, known positions
- High-frequency actions (grounding adds latency)
- Precision required (coordinates are more exact)
Trade-offs
| Aspect | Grounded | Direct Coordinates |
|---|---|---|
| Flexibility | High | Low |
| Precision | Medium | High |
| Speed | Slower | Faster |
| Error handling | Descriptive | Silent failures |
Tips
Write specific descriptions. “The blue submit button at the bottom of the form” beats “the button”. Always use recent screenshots. Stale images lead to wrong coordinates if UI changed. HandleNone returns. Grounder returns None if it can’t find the element—provide fallback behavior.
→ Computer Tools — Underlying computer control