Overview
Navigate to Tasksets to see two tabs:- Leaderboards — Public benchmarks with ranked results
- My Tasksets — Your personal task collections

Leaderboards
The Leaderboards tab shows public benchmarks from the community:- Dataset cards — Each card shows a benchmark with ranked entries
- Metrics — Average score, Best@3, Best@5
- Filter by organization — Focus on specific providers
- Search — Find specific benchmarks
Creating a Taskset
Click New Taskset to create one:- Enter a name for your taskset
- Click Create
- You’re taken to your new (empty) taskset
Adding Tasks
Once you have a taskset, add tasks in two ways: From an Environment’s Scenarios:- Go to an environment’s Scenarios tab
- Click on a scenario
- Create tasks with specific arguments
- Select your taskset as the destination
- Open your taskset
- Click Upload Tasks in the header
- Paste a JSON array of task configurations
- The modal validates your tasks before upload

Taskset Details
Click on a taskset to see its detail page:Leaderboard Tab
Shows aggregated results for this taskset:- Agent rankings — Performance by agent/model
- Metrics — Success rate, average score
- Trends — Performance over time
Tasks Tab
Lists all tasks in the taskset:- Grid/List view — Toggle between compact and detailed views
- Filters — By status, tags, scenario
- Bulk actions — Select multiple tasks to run or delete
- Task details — Click to see configuration
- Scenario name and arguments
- Run history (success/fail indicators)
- Tags for organization
Agents Tab
Compare agent performance across all tasks:- Agent matrix — Side-by-side comparison
- Per-task breakdown — See where agents succeed or fail
- Drill down — Click to view specific runs
Jobs Tab
Background jobs for this taskset:- Batch runs — Evaluation jobs in progress
- Status — Queued, running, completed
- Results — Click to see outcomes
Settings Tab
Configure your taskset:- Name — Edit the display name
Running a Taskset
Run all tasks in a taskset with one click:- Open your taskset
- Click Run Taskset in the header
- Select model and configuration
- Jobs are queued and run in parallel
Task Configuration
Tasks are defined with:- scenario — The scenario name to run
- args — Arguments passed to the scenario
- env.name — The environment containing the scenario
- prompt — (Optional) Override the scenario’s default prompt