CAD-Bench

CAD-Bench

An open benchmark for AI CAD agents. 70 tasks across 20 categories, evaluated on 20 agents.

data source: live (9 runs · 1 agents × 9 tasks) · synthetic baseline for the rest
#AgentScorePass@1
1Human Baseline (Mech-E)
n=4 senior engineers
86.0
46%
2Zoo Text-to-CAD
Zoo (KittyCAD)
71.8
7%
3OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
70.0
9%
4Adam (CADcrush)
CADcrush
64.4
1%
5GPT-5 → CadQuery
OpenAI + CadQuery 2.4
64.1
4%
6Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
62.5
0%
7Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
62.1
10%
8CAD-Coder R1
CAD-Coder Labs (research)
57.6
0%
9DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
57.4
0%
10Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
56.2
0%
11Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
55.0
0%
12Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
54.5
0%
13Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
50.2
0%
14Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
46.0
0%
15GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
45.8
0%
16DeepCAD
Wu et al. 2021 (research)
44.3
0%
17Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
41.9
0%
18Trellis 3D
Microsoft Research
21.9
0%
19Hunyuan3D-2
Tencent
19.0
0%
20Spline AI
Spline.design
16.3
0%