CAD-Bench
← back
STD-005 · Standards Compliance · difficulty 3/5

ISO 8734 dowel pin Ø6 m6 × 30

sha256:00b41ef02ac9d301

§1Prompt verbatim

Cylindrical dowel pin per ISO 8734 type A: Ø 6 m6 (+0.012 / +0.004), length 30 mm, both ends spherically radiused R 0.6 mm. Surface finish Ra ≤ 0.4 µm.

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.005 mm

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.86610
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.7598
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.6959
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.62510
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.6129
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.61011
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.59512
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.5819
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.49413
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.4780
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.41815
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.40314
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.39116
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.36815
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.3240
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.23923
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.20023
Trellis 3D
Trellis 3D
Microsoft Research
0.1090
no manifold solid produced
Spline AI
Spline AI
Spline.design
2
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.000101

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentWatert.Manif.GD&T ComplianceStandards ComplianceP@1p50latencycost
OpenAI o4 (reasoning) → CadQuery0.9730.6230.7711.000106.0s$0.880
Human Baseline (Mech-E)0.9610.8330.9001.000628.0s$5.331
Zoo Text-to-CAD0.9500.7220.7440.0004.4s$0.152
Gemini 2.5 Flash → CadQuery0.9380.4630.5830.00012.7s$0.024
Claude Sonnet 4.6 → CadQuery0.9380.5880.6800.00013.3s$0.074
GPT-5 → CadQuery0.9480.6060.7360.00045.6s$0.224
Claude Opus 4.7 → CadQuery0.9360.6220.7350.00047.9s$0.315
Adam (CADcrush)0.9380.6920.7160.0007.5s$0.281
Claude Haiku 4.5 → CadQuery0.9220.3940.4690.00010.3s$0.020
Claude Opus 4.7 → OpenSCAD0.9270.4300.5570.00024.2s$0.309
Llama 3.3 70B → OpenSCAD×0.9130.2830.3860.00015.4s$0.021
Qwen3 Coder → CadQuery×0.9110.4240.5200.00013.9s$0.033
CAD-Coder R1×0.9130.4680.4910.0005.2s$0.005
DeepSeek R1 (reasoning) → CadQuery×0.9020.5260.5330.000104.6s$0.035
Gemini 2.5 Pro → OpenSCAD×0.8950.4430.4790.00022.1s$0.095
DeepCAD×0.8850.3730.2810.0003.5s$0.021
GPT-5 Mini → OpenSCAD×0.8810.2690.3790.00010.6s$0.010
Trellis 3D×0.8680.0590.0470.00013.4s$0.041
Spline AI
kernel error: BRepCheck_NotClosed
×0.0000.0008.5s$0.034
Hunyuan3D-2×0.8500.0560.0430.00038.8s$0.064