CAD-Bench
← back

BREP Fidelity

Tests whether the agent emits a clean boundary representation (named faces, coherent edge graph, exact NURBS surfaces) versus a tessellated approximation. Round-trips through AP242 STEP.

STEP Round-trip Chamfer · mm · Edge-Manifoldness · ratio · Euler-Poincaré Compliance · boolean · Feature Recall · ratio ·

RANKED AGENTS · 95 % CI

#AgentScore
1Human Baseline (Mech-E)
95.9
[94.7, 96.9] · n=4
2Zoo Text-to-CAD
91.7
[90.8, 92.5] · n=4
3CAD-Coder R1
88.2
[87.4, 89.1] · n=4
4Adam (CADcrush)
83.3
[70.7, 89.8] · n=4
5OpenAI o4 (reasoning) → CadQuery
77.5
[65.6, 89.5] · n=4
6Claude Sonnet 4.6 → CadQuery
75.0
[62.2, 87.4] · n=4
7DeepSeek R1 (reasoning) → CadQuery
75.0
[61.9, 88.1] · n=4
8Claude Opus 4.7 → CadQuery
71.6
[48.3, 89.7] · n=5
9DeepCAD
70.6
[57.8, 83.5] · n=4
10Qwen3 Coder → CadQuery
65.2
[21.7, 87.4] · n=4
11Claude Haiku 4.5 → CadQuery
57.6
[56.9, 58.1] · n=4
12Gemini 2.5 Flash → CadQuery
51.1
[15.0, 78.6] · n=4
13GPT-5 → CadQuery
47.0
[15.9, 63.5] · n=4
14Llama 3.3 70B → OpenSCAD
41.9
[39.4, 44.4] · n=4
15GPT-5 Mini → OpenSCAD
41.7
[38.7, 47.5] · n=4
16Gemini 2.5 Pro → OpenSCAD
34.5
[33.7, 35.1] · n=4
17Hunyuan3D-2
33.5
[33.2, 33.7] · n=4
18Trellis 3D
26.3
[26.1, 26.5] · n=4
19Claude Opus 4.7 → OpenSCAD
26.1
[8.7, 35.3] · n=4
20Spline AI
23.5
[23.4, 23.5] · n=4

TASKS IN THIS CATEGORY