CAD-Bench
← back
BOOL-001 · Boolean Robustness · difficulty 4/5

Tangent cylinder onto cube (line-of-contact)

sha256:a7c33b5db00e1f01

§1Prompt verbatim

Place a Ø 20 × 40 mm cylinder on the +Z face of a 60 × 60 × 60 mm cube so the cylinder is internally tangent to one cube edge along its full length. Union into a single watertight body. The shared seam is exactly one straight edge.

§2Ground-truth spec

shells1
V−E+F2
genus0
watertighttrue
manifoldtrue
acceptance ε±0.02 mm
Tangent contact stresses kernel ε-handling — many emit a sliver face along the seam.

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.8107
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.6179
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.6038
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.5969
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.5690
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.55713
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.53210
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.53111
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.52211
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.5190
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.5149
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.49211
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.47412
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.45913
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.41716
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.39611
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.36917
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.23122
Spline AI
Spline AI
Spline.design
0.0580
Trellis 3D
Trellis 3D
Microsoft Research
0.0000

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentVol IoUWatert.Manif.Euler-Poincaré ComplianceP@1p50latencycost
Human Baseline (Mech-E)0.8100.9661.000735.2s$6.952
GPT-5 → CadQuery0.6170.9420.00028.9s$0.212
Claude Opus 4.7 → CadQuery0.6030.9470.00033.7s$0.350
OpenAI o4 (reasoning) → CadQuery0.5960.9400.00086.4s$1.124
Gemini 2.5 Pro → OpenSCAD0.5690.9410.00032.5s$0.093
Claude Sonnet 4.6 → CadQuery0.5570.9310.00022.2s$0.073
DeepSeek R1 (reasoning) → CadQuery0.5320.9330.000102.6s$0.039
CAD-Coder R10.5310.9370.0006.2s$0.006
Llama 3.3 70B → OpenSCAD0.5220.9290.00020.6s$0.019
Claude Opus 4.7 → OpenSCAD0.5190.9350.00037.5s$0.338
Adam (CADcrush)0.5140.9310.0006.7s$0.302
Gemini 2.5 Flash → CadQuery0.4920.926×0.00012.2s$0.022
DeepCAD0.4740.925×0.0005.4s$0.018
Zoo Text-to-CAD0.4590.925×0.0007.5s$0.210
Claude Haiku 4.5 → CadQuery0.417×0.912×0.0007.2s$0.017
Qwen3 Coder → CadQuery0.396×0.913×0.00022.8s$0.030
GPT-5 Mini → OpenSCAD0.369×0.908×0.00016.4s$0.012
Hunyuan3D-20.231×0.885×0.00042.1s$0.075
Spline AI0.058×0.858×0.0007.2s$0.040
Trellis 3D0.000×0.850×0.00011.4s$0.047