Skip to main content

Prerequisites

  • HUD API key: Remote training requires authentication. Set HUD_API_KEY before running:
export HUD_API_KEY="sk-hud-..."  # get one at https://hud.ai
# Or persist it locally:
hud set HUD_API_KEY=sk-hud-...
  • Docker daemon: For local runs (using --local) or when training against a local Docker image, ensure Docker Desktop is installed and the Docker daemon is running.

Quickstart

Install and download a taskset:
uv tool install hud-python
hud get hud-evals/2048-basic

1) Simple: Train (remote by default)

hud rl 2048-basic.json
This launches training remotely and automatically provisions a vLLM server and a trainer for you. You can monitor progress on https://hud.ai. The server persists between runs, so you can rerun training or evaluate against the same endpoint. Optional baseline first (Claude or Operator):
hud eval 2048-basic.json

2) Run on your own machine/remote

Use any provider with at least 2 GPUs (one for inference, one for training). Run locally with the flag --local:
uv tool install hud-python
hud get hud-evals/2048-basic
hud rl 2048-basic.json --local
  • 2× A100: quick iteration, shorter runs
  • 8× A100: higher throughput for larger tasksets
Training throughput depends on task complexity and parallelism (max_parallel_episodes).

3) Build your own environment (hud init)

Create a new MCP environment, develop with hot-reload, and train on a production image:
hud init my-env && cd my-env
hud dev --interactive
# When ready to run:
hud rl
Change the tasks.json to include other tasks you want to train on. See hud init for options and details.

Getting the best performance

Often training a good model requires many iterations over the parameters of the trainer. Take the config generated by hud rl and modify it to various values to do a hyperparameter sweep. For easy launching, specify the tasks and config upfront, and add --yes to automatically launch vllm and training.
hud rl taskset.json --config rl-config.json --yes
Additionally, sometimes it may be helpful to run an initial analysis on the dataset to determine which tasks would be the most informative to trian on. In that case either start with a deployed model or run hud rl without training, and then:
hud eval taskset.json --full --group-size 6 --max-steps 5
This will prompt you for the model choice, produce a table of accuracies per task. Prefer tasks which are 10%-60% accurate for training. Some general findings from our internal training runs:
  • As many different tasks per gradient update as possible (runs with 4+ GPUs and batch size of 50+ are much more stable than single GPU runs)
  • Batch size should be somewhere around 2/X where X is the accuracy of that given task on an untrained model.

Pricing

Below is the pricing by GPU type. Actual prices vary — see https://hud.ai/project/billing for current rates. vLLM GPU Pricing (2 Hosted GPUs)
GPU typeMemoryEst. price/hr
A100 80GB80 GB$4.95
H100 80GB80 GB$7.95
Training GPU Pricing
GPU typeMemoryEst. price/hr
A100 80GB80 GB$3.95
H100 80GB80 GB$5.40

Learn more