Skip to main content

Tasks format

HUD tasksets can be provided in two primary formats (both supported):
  1. A single JSON file containing a list of task objects (recommended)
[
  {
    "id": "browser_2048_128",
    "prompt": "Reach 128 in 2048.",
    "mcp_config": {
      "hud": {
        "url": "https://mcp.hud.ai/v3/mcp",
        "headers": {
          "Authorization": "Bearer ${HUD_API_KEY}",
          "Mcp-Image": "hudevals/hud-browser:0.1.3"
        }
      }
    },
    "setup_tool": {"name": "launch_app", "arguments": {"app_name": "2048"}},
    "evaluate_tool": {"name": "evaluate", "arguments": {"name": "game_2048_max_number", "arguments": {"target": 128}}}
  }
]
Save as 2048-basic.json and run:
hud eval 2048-basic.json
hud rl 2048-basic.json
  1. JSONL file with one task object per line
  • prompt: instruction for the agent
  • mcp_config: where to run the environment (local docker or remote MCP)
  • setup_tool (optional): a tool call to prepare the environment
  • evaluate_tool: a tool call to compute reward
  • system_prompt (optional): extra guidance for the agent

Hosting on HuggingFace

You can host tasksets on the Hub and fetch them with:
hud get hud-evals/2048-basic
The command downloads the JSONL task file and places it in your project directory. This allows running the full dataset or training with simply:
hud eval hud-evals/2048-basic
hud rl hud-evals/2048-basic

Tips

  • Keep tasks self-contained; use setup_tool to open apps or load data
  • Ensure evaluate_tool returns a numeric reward per episode
  • Use small task counts to iterate quickly; scale up once stable