Constraint Solving & Editability
Probes whether the agent exposes a working parametric graph: after the part is built we issue downstream parameter edits (length+30 %, hole diameter→M8) and re-evaluate without topological breakage.
Parametric Edit Accuracy · ratio · ↑Parametric Range Integrity · ratio · ↑Constraint Solve Rate · ratio · ↑
RANKED AGENTS · 95 % CI
| # | Agent | Score |
|---|---|---|
| 1 | Human Baseline (Mech-E) | 86.6 [83.3, 89.0] · n=3 |
| 2 | OpenAI o4 (reasoning) → CadQuery | 80.2 [79.3, 80.7] · n=3 |
| 3 | Claude Opus 4.7 → CadQuery | 77.9 [74.5, 80.8] · n=3 |
| 4 | Adam (CADcrush) | 75.0 [72.3, 77.3] · n=3 |
| 5 | Claude Sonnet 4.6 → CadQuery | 72.9 [71.6, 73.8] · n=3 |
| 6 | GPT-5 → CadQuery | 72.3 [71.1, 73.7] · n=3 |
| 7 | DeepSeek R1 (reasoning) → CadQuery | 70.9 [68.9, 72.7] · n=3 |
| 8 | Zoo Text-to-CAD | 68.0 [66.0, 69.9] · n=3 |
| 9 | Qwen3 Coder → CadQuery | 62.4 [61.0, 63.7] · n=3 |
| 10 | CAD-Coder R1 | 57.6 [56.2, 59.1] · n=3 |
| 11 | Gemini 2.5 Pro → OpenSCAD | 56.1 [54.0, 57.6] · n=3 |
| 12 | Claude Haiku 4.5 → CadQuery | 48.9 [47.7, 50.0] · n=3 |
| 13 | GPT-5 Mini → OpenSCAD | 44.2 [43.3, 44.7] · n=3 |
| 14 | Llama 3.3 70B → OpenSCAD | 40.4 [39.2, 41.9] · n=3 |
| 15 | Gemini 2.5 Flash → CadQuery | 39.1 [0.0, 60.7] · n=3 |
| 16 | Claude Opus 4.7 → OpenSCAD | 38.5 [0.0, 58.5] · n=3 |
| 17 | DeepCAD | 27.2 [26.4, 28.7] · n=3 |
| 18 | Hunyuan3D-2 | 4.3 [4.2, 4.4] · n=3 |
| 19 | Spline AI | 3.8 [3.7, 3.9] · n=3 |
| 20 | Trellis 3D | 3.3 [0.0, 5.0] · n=3 |