HUD Documentation - Evaluations and RL Environments.

Cookbooks are complete, runnable examples — each one is a small project you can copy and adapt. They all live in the cookbooks/ directory of the SDK repo. The ones below with a walkthrough have a full guide here; the rest are best read straight from the source.

Walkthroughs

Coding agent

Run a coding agent against a shell + files environment, graded by tests. Source: cookbooks/codex-coding.

A2A chat

Serve a chat task over the A2A protocol and talk to it from any client. Source: cookbooks/a2a-chat.

Ops diagnostics

An investigation task where the agent integrates evidence into a diagnosis.

Robot benchmark

Run a VLA policy against a containerized robot sim, graded by task success.

More runnable examples

These ship in the repo without a separate walkthrough — read the README in each directory to run them.

RL training

On-policy RL: roll out a taskset with the current weights, train on the resulting trajectories, and serve the updated weights for the next rollout — all under one trainable model string.

Connect Four self-play

Symmetric self-play GRPO on a 6×7 Connect Four board, training both sides from a single rollout.

Fireworks RL training

The RL loop driven through the Fireworks Training API instead of the HUD training service.

Harbor interop

Coding agent

⌘I

​Walkthroughs

Coding agent

A2A chat

Ops diagnostics

Robot benchmark

​More runnable examples

RL training

Connect Four self-play

Fireworks RL training

Walkthroughs

More runnable examples