TASKS · pilot subset
28 prompts across 20 categories
Each task carries a verbatim natural-language prompt, a canonical reference STEP (sha-256 in the listing), numerical ground-truth quantities (volume, surface area, Euler χ, genus, named features), and a difficulty class 1-5. Click a task to see the held-out reference, candidate output viewers, and metric scores per agent.