Version 0.4.72 - Latest stable release
I want to evaluate agents
Test Claude, Operator, or custom agents on benchmarks like SheetBench and OSWorld
I want to build environments
Wrap any software in dockerized MCP for scalable and generalizable agent evaluation
I want to train agents
Use reinforcement learning and GRPO on evaluations to improve agent performance
What is HUD?
HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents, building new environments, or training models with RL, HUD provides the infrastructure.Why HUD?
- 🔌 MCP-native: Any agent can connect to any environment
- 📡 Live telemetry: Debug every tool call at hud.ai
- ⚡ HUD Gateway: Unified inference API for all LLMs
- 🚀 Production-ready: From local Docker to cloud scale
- 🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
- 🔧 CLI tools: Create, develop, run, and train with
hud init,hud dev,hud run,hud eval,hud rl
3-minute quickstart
Run your first agent evaluation with zero setup
HUD Gateway
Unified inference API for OpenAI, Anthropic, Gemini, and Open Source Models
Add to Cursor/Claude
Give your AI assistant full knowledge of HUD docs