MECH-018 · Parametric Mechanical Parts · difficulty 3/5
Heat-set insert boss array (4× M3)
sha256:10ae9f0bcd220e31…
§1Prompt verbatim
Plate 60 × 60 × 8 mm with four heat-set-insert bosses on a 40 × 40 mm square pitch. Each boss: outer Ø 6.5 mm, inner Ø 4.5 mm × 6 mm deep, 0.5 × 30° lead-in chamfer at the top. Conform to McMaster 94459A205 insert spec.
§2Ground-truth spec
shells1
watertighttrue
manifoldtrue
acceptance ε±0.05 mm
featuresheatset_boss_x4, lead_in_chamfer_x4
§3Reference render
canonical reference · drag to orbit, scroll to zoom
Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).
§4Per-agent renders
reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100✓
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.82110✓
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.77310✓
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.72811✓
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.7148✓
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.63911✗
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.61511✓
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.6059✗
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.59512✗
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.58310✗
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.5380✗
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.52110✗
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.47113✗
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.4450✗
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.40213✗
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.36518✗
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.35219✗
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.34917✗
Trellis 3D
Trellis 3D
Microsoft Research
0.1340✗
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.07266✗
Spline AI
Spline AI
Spline.design
0.0000✗
Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.
§5Per-agent metrics
ranked by Vol IoU · same data as the leaderboard, restricted to this task
| Agent | Watert. | Manif. | Named-Dimension RMSE | GD&T Compliance | FeatRec | P@1 | p50 | latency | cost |
|---|---|---|---|---|---|---|---|---|---|
| Human Baseline (Mech-E) | ✓ | 0.979 | 0.096 | 0.825 | 0.862 | 1.000 | — | 743.9s | $6.343 |
| Claude Opus 4.7 → CadQuery | ✓ | 0.959 | 0.184 | 0.619 | 0.672 | 1.000 | — | 36.9s | $0.285 |
| GPT-5 → CadQuery | ✓ | 0.964 | 0.205 | 0.606 | 0.701 | 0.000 | — | 33.8s | $0.176 |
| Zoo Text-to-CAD | ✓ | 0.960 | 0.175 | 0.673 | 0.733 | 0.000 | — | 7.3s | $0.175 |
| CAD-Coder R1 | ✓ | 0.940 | 0.298 | 0.428 | 0.673 | 0.000 | — | 4.8s | $0.006 |
| OpenAI o4 (reasoning) → CadQuery | ✓ | 0.950 | 0.181 | 0.665 | 0.775 | 0.000 | — | 96.1s | $0.945 |
| Claude Sonnet 4.6 → CadQuery | ✓ | 0.937 | 0.254 | 0.569 | 0.680 | 0.000 | — | 20.3s | $0.060 |
| Adam (CADcrush) | ✓ | 0.934 | 0.235 | 0.634 | 0.645 | 0.000 | — | 11.2s | $0.261 |
| Qwen3 Coder → CadQuery | ✓ | 0.935 | 0.285 | 0.443 | 0.582 | 0.000 | — | 16.7s | $0.034 |
| Claude Opus 4.7 → OpenSCAD | ✓ | 0.927 | 0.222 | 0.429 | 0.601 | 0.000 | — | 27.5s | $0.294 |
| Gemini 2.5 Flash → CadQuery | ✓ | 0.928 | 0.317 | 0.482 | 0.597 | 0.000 | — | 10.8s | $0.023 |
| Claude Haiku 4.5 → CadQuery | ✓ | 0.919 | 0.339 | 0.351 | 0.519 | 0.000 | — | 9.7s | $0.016 |
| Gemini 2.5 Pro → OpenSCAD | ✓ | 0.923 | 0.291 | 0.446 | 0.506 | 0.000 | — | 34.7s | $0.105 |
| DeepSeek R1 (reasoning) → CadQuery | × | 0.910 | 0.317 | 0.514 | 0.664 | 0.000 | — | 121.1s | $0.039 |
| Llama 3.3 70B → OpenSCAD | × | 0.904 | 0.331 | 0.295 | 0.421 | 0.000 | — | 23.8s | $0.023 |
| GPT-5 Mini → OpenSCAD | × | 0.900 | 0.346 | 0.265 | 0.419 | 0.000 | — | 13.9s | $0.012 |
| DeepCAD | × | 0.905 | 0.344 | 0.372 | 0.458 | 0.000 | — | 4.6s | $0.023 |
| Trellis 3D | × | 0.870 | 0.547 | 0.059 | 0.213 | 0.000 | — | 11.7s | $0.056 |
| Hunyuan3D-2 | × | 0.861 | 0.516 | 0.052 | 0.194 | 0.000 | — | 31.0s | $0.063 |
| Spline AI | × | 0.850 | 0.565 | 0.027 | 0.093 | 0.000 | — | 7.5s | $0.047 |