Boolean Robustness
Edge-case CSG operations: tangent fillets, coplanar faces, near-degenerate intersections, high-genus subtractions. Stresses kernel ε-tolerance handling. Patterned after the OpenCascade and ACIS robustness suites.
Volumetric IoU · ratio · ↑Edge-Manifoldness · ratio · ↑Euler-Poincaré Compliance · boolean · ↑Watertightness · boolean · ↑
RANKED AGENTS · 95 % CI
| # | Agent | Score |
|---|---|---|
| 1 | GPT-5 → CadQuery | 88.6 [86.4, 91.8] · n=5 |
| 2 | Claude Sonnet 4.6 → CadQuery | 88.0 [87.5, 88.6] · n=5 |
| 3 | Adam (CADcrush) | 76.6 [64.2, 89.0] · n=5 |
| 4 | Human Baseline (Mech-E) | 74.2 [36.9, 93.8] · n=5 |
| 5 | Claude Opus 4.7 → OpenSCAD | 70.9 [50.0, 86.9] · n=5 |
| 6 | Claude Opus 4.7 → CadQuery | 68.7 [39.9, 88.6] · n=6 |
| 7 | DeepSeek R1 (reasoning) → CadQuery | 66.6 [43.9, 89.3] · n=5 |
| 8 | Zoo Text-to-CAD | 65.8 [35.2, 90.1] · n=5 |
| 9 | OpenAI o4 (reasoning) → CadQuery | 65.6 [30.3, 90.0] · n=5 |
| 10 | GPT-5 Mini → OpenSCAD | 64.6 [42.8, 81.8] · n=5 |
| 11 | CAD-Coder R1 | 64.4 [29.4, 87.6] · n=5 |
| 12 | Gemini 2.5 Pro → OpenSCAD | 59.2 [23.9, 83.3] · n=5 |
| 13 | Qwen3 Coder → CadQuery | 48.2 [31.9, 70.2] · n=5 |
| 14 | Llama 3.3 70B → OpenSCAD | 48.2 [31.7, 70.0] · n=5 |
| 15 | Gemini 2.5 Flash → CadQuery | 47.8 [19.4, 71.0] · n=5 |
| 16 | DeepCAD | 36.3 [18.0, 54.1] · n=5 |
| 17 | Claude Haiku 4.5 → CadQuery | 31.3 [13.2, 49.0] · n=5 |
| 18 | Hunyuan3D-2 | 25.5 [24.1, 27.1] · n=5 |
| 19 | Trellis 3D | 25.5 [23.1, 27.4] · n=5 |
| 20 | Spline AI | 23.6 [22.4, 25.0] · n=5 |