CAD-Bench
← back

Functional Intent · FEA-Gated

Prompts specify a *function* ("hold a 250 N transverse load with a 4× safety factor in 6061-T6") rather than a geometry. Score requires the agent's part to pass automatic linear-elastic FEA at the spec'd load with stress ≤ 0.8·σ_yield.

FEA-Yield Pass · ratio · Min-Wall Compliance · ratio · Feature Recall · ratio ·

RANKED AGENTS · 95 % CI

#AgentScore
1Human Baseline (Mech-E)
85.5
[82.3, 88.5] · n=3
2OpenAI o4 (reasoning) → CadQuery
68.3
[66.3, 69.9] · n=3
3Zoo Text-to-CAD
64.6
[62.4, 65.9] · n=3
4Claude Opus 4.7 → CadQuery
63.5
[61.5, 65.3] · n=3
5GPT-5 → CadQuery
61.4
[61.3, 61.6] · n=3
6Claude Sonnet 4.6 → CadQuery
60.3
[59.0, 61.8] · n=3
7DeepSeek R1 (reasoning) → CadQuery
57.7
[55.0, 60.2] · n=3
8Gemini 2.5 Flash → CadQuery
50.2
[49.2, 50.8] · n=3
9Qwen3 Coder → CadQuery
50.1
[48.6, 51.1] · n=3
10CAD-Coder R1
48.5
[46.5, 49.6] · n=3
11Claude Haiku 4.5 → CadQuery
40.8
[40.0, 41.3] · n=3
12Adam (CADcrush)
37.7
[0.0, 57.0] · n=3
13GPT-5 Mini → OpenSCAD
36.7
[36.2, 37.1] · n=3
14Claude Opus 4.7 → OpenSCAD
34.5
[0.0, 52.2] · n=3
15DeepCAD
34.4
[33.1, 35.2] · n=3
16Gemini 2.5 Pro → OpenSCAD
31.0
[0.0, 47.1] · n=3
17Llama 3.3 70B → OpenSCAD
23.5
[0.0, 36.4] · n=3
18Hunyuan3D-2
17.4
[16.7, 18.1] · n=3
19Trellis 3D
17.2
[16.7, 17.7] · n=3
20Spline AI
5.1
[0.0, 7.7] · n=3

TASKS IN THIS CATEGORY