TrainingClient drives HUD-managed training for one model: it accumulates gradients from rewarded trajectories and advances the weights behind the model’s gateway slug in place. Inputs are Runs (sent inline) or trace_id strings (resolved server-side); the two can be mixed.
hud models.
TrainingClient
| Argument | Default | Meaning |
|---|---|---|
model | — | Trainable model slug or id (the gateway string you also sample). |
api_key | settings.api_key | HUD API key. |
base_url | settings.hud_rl_url | Training (RL) service. |
api_url | settings.hud_api_url | Catalog API (resolves the slug → id once). |
Methods
| Method | Returns | Purpose |
|---|---|---|
forward_backward(trajectories, *, loss_fn, loss_fn_config=None, group_size=None, reward_scale=1.0, num_substeps=1) | ForwardBackwardResult | Accumulate gradients with a built-in loss_fn. |
optim_step(*, learning_rate, beta1=0.9, beta2=0.95, eps=1e-8, weight_decay=0.0) | OptimStepResult | Apply gradients, checkpoint, and promote the new weights. |
step(trajectories, *, learning_rate, ...) | OptimStepResult | One forward_backward then one optim_step. |
forward_backward_custom(trajectories, loss_fn, *, group_size=None, reward_scale=1.0) | ForwardBackwardResult | Accumulate gradients with a client-side loss (see Custom losses). |
forward(trajectories, *, group_size=None, reward_scale=1.0) | ForwardResult | Current-policy forward pass returning per-token tensors. |
backward(forward_id, weights, *, metrics=None) | ForwardBackwardResult | Apply caller-computed per-token gradients to a forward pass. |
available_losses() | list[str] | Built-in loss_fn names this model’s provider supports. |
group_size (GRPO); None treats the whole batch as one group. num_substeps splits the batch for gradient accumulation.
Inputs
A training input is a recorded trajectory by id, or an inline one:Run builds the right form automatically — inline TrajectoryPayload when it carries token-level samples (local rollout), else its trace_id (remote rollout).
| Type | Fields |
|---|---|
TrajectorySample | prompt_token_ids, output_token_ids, output_logprobs |
TrajectoryPayload | samples: list[TrajectorySample], reward, trace_id=None |
Built-in losses
loss_fn is an open string validated against the model’s provider; discover the set with await trainer.available_losses(). BuiltinLoss lists the common Tinker names (each is a str):
BuiltinLoss | Value | Use |
|---|---|---|
CROSS_ENTROPY | cross_entropy | Supervised — imitate sampled tokens. |
IMPORTANCE_SAMPLING | importance_sampling | On-policy PG, rollout-logprob ratio. |
PPO | ppo | Clipped-surrogate PG. |
CISPO | cispo | Clipped IS policy optimization. |
DRO | dro | Direct reward optimization. |
loss_fn_config forwards hyperparameters to the loss (e.g. {"epsilon": 0.2} for the ppo clip).
Custom losses
forward_backward_custom runs the current-policy forward pass server-side, hands you per-token tensors, runs your loss locally (torch autograd), and ships the per-token gradients back. Requires torch (pip install 'hud-python[train]').
logprobs[i] are the current policy π_θ for datum i as differentiable leaves. Everything else is constant on the matching DatumTensors:
DatumTensors | Meaning |
|---|---|
logprobs | Current-policy π_θ, per token (the differentiable leaf). |
sampling_logprobs | Rollout policy q, per token. |
mask | 1.0 on action tokens, 0.0 on observation tokens. |
reward, traj_idx, group_idx | Trajectory reward, source trajectory, GRPO group (or None). |
forward returns a ForwardResult (forward_id + data: list[DatumTensors]); backward(forward_id, weights) applies weights[d][t] = -dC/dlogprobs.
Results
| Type | Fields |
|---|---|
ForwardBackwardResult | metrics: dict[str, float], num_datums |
OptimStepResult | step, checkpoint_id, sampler_path, state_path, model |
hud models CLI
Manage trainable models from the shell:
| Command | Purpose |
|---|---|
hud models list | List gateway models. |
hud models fork <model> --name <slug> | Fork a team-owned trainable model from an existing one. |
hud models checkpoints <model> | List the checkpoint tree (▶ marks the active head). |
hud models head <model> [--set <checkpoint-id>] | Show — or set (rollback/select) — the active checkpoint. |
See also
Train on rewards
The end-to-end training how-to.
Designing tasks for signal
Produce within-group reward spread so training has signal.