CAD-Bench
An open benchmark for AI CAD agents. 70 tasks across 20 categories, evaluated on 20 agents.
data source: live (9 runs · 1 agents × 9 tasks) · synthetic baseline for the rest
| # | Agent | Score | Pass@1 |
|---|---|---|---|
| 1 | Human Baseline (Mech-E) n=4 senior engineers | 86.0 | 46% |
| 2 | Zoo Text-to-CAD Zoo (KittyCAD) | 71.8 | 7% |
| 3 | OpenAI o4 (reasoning) → CadQuery OpenAI + CadQuery 2.4 | 70.0 | 9% |
| 4 | Adam (CADcrush) CADcrush | 64.4 | 1% |
| 5 | GPT-5 → CadQuery OpenAI + CadQuery 2.4 | 64.1 | 4% |
| 6 | Claude Sonnet 4.6 → CadQuery Anthropic + CadQuery 2.4 | 62.5 | 0% |
| 7 | Claude Opus 4.7 → CadQuery Anthropic + CadQuery 2.4 | 62.1 | 10% |
| 8 | CAD-Coder R1 CAD-Coder Labs (research) | 57.6 | 0% |
| 9 | DeepSeek R1 (reasoning) → CadQuery DeepSeek + CadQuery 2.4 | 57.4 | 0% |
| 10 | Gemini 2.5 Flash → CadQuery Google + CadQuery 2.4 | 56.2 | 0% |
| 11 | Qwen3 Coder → CadQuery Alibaba + CadQuery 2.4 | 55.0 | 0% |
| 12 | Claude Opus 4.7 → OpenSCAD Anthropic + OpenSCAD 2024.06 | 54.5 | 0% |
| 13 | Gemini 2.5 Pro → OpenSCAD Google + OpenSCAD 2024.06 | 50.2 | 0% |
| 14 | Claude Haiku 4.5 → CadQuery Anthropic + CadQuery 2.4 | 46.0 | 0% |
| 15 | GPT-5 Mini → OpenSCAD OpenAI + OpenSCAD 2024.06 | 45.8 | 0% |
| 16 | DeepCAD Wu et al. 2021 (research) | 44.3 | 0% |
| 17 | Llama 3.3 70B → OpenSCAD Meta + OpenSCAD 2024.06 | 41.9 | 0% |
| 18 | Trellis 3D Microsoft Research | 21.9 | 0% |
| 19 | Hunyuan3D-2 Tencent | 19.0 | 0% |
| 20 | Spline AI Spline.design | 16.3 | 0% |