MECH-031 · Parametric Mechanical Parts · difficulty 5/5
Threaded cap with diamond knurl
sha256:7caf01eb22ddc041…
§1Prompt verbatim
Cylindrical cap Ø 35 × 18 mm tall, internal M30 × 1.5 thread depth 14 mm, exterior diamond knurl pitch 0.8 mm, knurl height 0.3 mm, covering the central 12 mm of the height. Top face flat, bottom face open. ISO 261 thread tolerance 6H.
§2Ground-truth spec
shells1
watertighttrue
manifoldtrue
acceptance ε±0.05 mm
featuresthread_M30x1.5, knurl_diamond_0.8
Knurls are the canonical 'looks easy, isn't' surface — most LLMs emit a flat texture map, not actual geometric ridges.
§3Reference render
canonical reference · drag to orbit, scroll to zoom
Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).
§4Per-agent renders
reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100✓
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.7407✓
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.6468✓
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.6469✗
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.6449✗
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.6298✗
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.61112✗
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.6039✗
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.5550✗
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.51613✗
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.44812✗
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.42613✗
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.41512✗
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.3940✗
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.38415✗
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.34014✗
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.30118✗
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.26822✗
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.09554✗
Trellis 3D
Trellis 3D
Microsoft Research
0.0000✗
Spline AI
Spline AI
Spline.design
0.0000✗
Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.
§5Per-agent metrics
ranked by Vol IoU · same data as the leaderboard, restricted to this task
| Agent | Watert. | Manif. | Named-Dimension RMSE | GD&T Compliance | FeatRec | P@1 | p50 | latency | cost |
|---|---|---|---|---|---|---|---|---|---|
| Human Baseline (Mech-E) | ✓ | 0.968 | 0.138 | 0.827 | 0.945 | 0.000 | — | 790.9s | $6.933 |
| Claude Opus 4.7 → CadQuery | ✓ | 0.956 | 0.200 | 0.616 | 0.698 | 0.000 | — | 49.3s | $0.334 |
| CAD-Coder R1 | ✓ | 0.949 | 0.268 | 0.468 | 0.706 | 0.000 | — | 6.4s | $0.005 |
| DeepSeek R1 (reasoning) → CadQuery | ✓ | 0.947 | 0.259 | 0.506 | 0.624 | 0.000 | — | 87.9s | $0.038 |
| Adam (CADcrush) | ✓ | 0.942 | 0.222 | 0.677 | 0.653 | 0.000 | — | 10.0s | $0.319 |
| Zoo Text-to-CAD | ✓ | 0.942 | 0.231 | 0.692 | 0.804 | 0.000 | — | 8.0s | $0.154 |
| OpenAI o4 (reasoning) → CadQuery | ✓ | 0.948 | 0.195 | 0.655 | 0.725 | 0.000 | — | 103.1s | $1.049 |
| Claude Opus 4.7 → OpenSCAD | ✓ | 0.935 | 0.223 | 0.434 | 0.600 | 0.000 | — | 33.7s | $0.289 |
| Claude Sonnet 4.6 → CadQuery | ✓ | 0.928 | 0.205 | 0.620 | 0.692 | 0.000 | — | 19.3s | $0.066 |
| Claude Haiku 4.5 → CadQuery | ✓ | 0.915 | 0.361 | 0.354 | 0.535 | 0.000 | — | 7.9s | $0.019 |
| Qwen3 Coder → CadQuery | ✓ | 0.915 | 0.334 | 0.393 | 0.562 | 0.000 | — | 16.8s | $0.035 |
| Gemini 2.5 Flash → CadQuery | × | 0.913 | 0.238 | 0.444 | 0.610 | 0.000 | — | 9.5s | $0.019 |
| Gemini 2.5 Pro → OpenSCAD | × | 0.909 | 0.304 | 0.427 | 0.508 | 0.000 | — | 31.3s | $0.107 |
| GPT-5 → CadQuery | × | 0.912 | 0.220 | 0.540 | 0.673 | 0.000 | — | 28.7s | $0.185 |
| DeepCAD | × | 0.901 | 0.372 | 0.384 | 0.477 | 0.000 | — | 5.3s | $0.021 |
| Llama 3.3 70B → OpenSCAD | × | 0.893 | 0.396 | 0.268 | 0.449 | 0.000 | — | 27.7s | $0.017 |
| GPT-5 Mini → OpenSCAD | × | 0.889 | 0.307 | 0.267 | 0.409 | 0.000 | — | 10.3s | $0.009 |
| Hunyuan3D-2 | × | 0.864 | 0.541 | 0.055 | 0.213 | 0.000 | — | 29.7s | $0.075 |
| Trellis 3D | × | 0.850 | 0.473 | 0.055 | 0.211 | 0.000 | — | 10.2s | $0.059 |
| Spline AI | × | 0.850 | 0.514 | 0.027 | 0.091 | 0.000 | — | 7.8s | $0.034 |