- Environment SDK — Create a harness with agent-callable tools. Define evaluation logic with scenarios that yield a prompt and a reward.
- Eval & Training Platform — Run evaluations at scale on hud.ai. Collect traces. Train models on successful runs.
- Model Gateway — One OpenAI-compatible endpoint at
inference.hud.aifor every model: Claude, GPT, Gemini, Grok. One API key, swap the model string.
Install
1. Environments: Define the your agents harness
An environment wraps your code as tools agents can call, and defines scenarios that evaluate what agents do. Each environment spins up fresh and isolated for every evaluation — no shared state, fully reproducible.The Canonical Workflow
2. Tasks & Training: Evaluate and Train
A task is a scenario with specific arguments. Group tasks into tasksets and run them across models. Evaluate and calibrate environments and tasks to specific models. You can train a model on a taskset to produce a better model on your usecase.3. Models: Any Model, One API
Out of the box integrations to all of the major model providers. Point any OpenAI-compatible client atinference.hud.ai and use any model. Browse all available models at hud.ai/models.
Next Steps
Core Concepts
Environments, tools, scenarios, tasks — defined in one place.
Environments
Tools, scenarios, and iteration.
Tasks & Training
Evaluate and train models.
Best Practices
Patterns for reliable environments and evals.
Community
GitHub
Star the repo and contribute
Discord
Join the community