HUD
HUD is a platform for building environments. You define an environment, write tasks for that environment, and run any agent to perform those tasks, at any scale. Our SDK is an open-source Python framework for all of this. The full workflow flows in five steps:Define any environment
An environment is some closed container for your agent to act in. Fundamentally it’s defined by:- the contents of the container (Environment)
- the tasks (and their rewards) to be performed inside it (Tasks & Tasksets)
- the capabilities the agent can use to perform these tasks (Capabilities)
Part 1: Declare your environment
Part 1: Declare your environment
The first and key part of any HUD workflow is declaring your environment
in a declaration file This scaffold is general on purpose - it describes any environment. A one-line shell task, a full GUI
desktop, a robot simulator - they’re all just environments with some bespoke content, tasks, and
associated capabilities. The complexity hidden under this file is hidden in the
HUD protocol Its thin envelope lets any model or harness plug into any environment.
env.py - here is a standard scaffold:env.py
Part 2: Choose your taskset
Part 2: Choose your taskset
Then just form a taskset (one or more tasks with parameters) in code or load one
from a file.
tasks.py
Spin it up anywhere
Once defined, an environment shouldn’t care where it runs - it should just work. The SDK lets you effortlessly switch between running your environment locally for development, on Daytona, Modal, or E2B for scale, or deploy to the HUD platform. The environment definition never changes - just the Runtime you pass:Part 3: Choose your runtime
Part 3: Choose your runtime
There are two main ways to run your declared environments.1. Package & deploy to the platform. Build a portable image once, push it to HUD,
and run any tasks against it from the platform - compare models on a taskset and browse
every trace, no local infra needed:2. Run programmatically. Drive rollouts programmatically from Python by picking a
runtime - the same taskset runs against any of them:
Evaluate and train any AI agent inside it
Since an environment only exposes capabilities, any agent plugs in. For standard models the HUD inference gateway and our prebuilt harnesses let you switch between models like Claude, GPT, or Gemini just by choosing the model name. Run rollouts in parallel with full isolation out of the box. Every rollout in the job is traced on the platform, so you can see exactly what the agent did realtime and how it was graded.Part 4: Run your agent
Part 4: Run your agent
Part 5: Learn
Part 5: Learn
Where to go next
To see what HUD hides under the hood, read about the Protocol. To go in depth on
each part of the workflow, start with Environments.
Build
A high-level guide on how to work with HUD.
Reference
The actual object reference: classes, objects, and abstractions.