CAD-Bench
← back
BOOL-005 · Boolean Robustness · difficulty 5/5

ε-offset extrusion (sliver-face stress)

sha256:7a1cb09001ddf0f4

§1Prompt verbatim

Cube 30 × 30 × 30 mm. Subtract from it a second cube of the same size, translated by (+0.005, +0.005, 0) mm. The result must be one watertight body — kernels must not leave a 5-µm sliver shell.

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.001 mm
5 µm offset is below most kernels' default ε. Tasks at this scale separate ACIS-grade kernels from naive CSG.

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.7959
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.6607
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.60211
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.58010
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.5119
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.49211
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.47811
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.4730
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.44812
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.4430
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.43513
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.42313
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.42112
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.41514
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.40112
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.37814
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.30415
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.12437
Trellis 3D
Trellis 3D
Microsoft Research
0.1080
Spline AI
Spline AI
Spline.design
0.1000

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentVol IoUWatert.Manif.Euler-Poincaré ComplianceP@1p50latencycost
Human Baseline (Mech-E)0.7950.9661.000900.2s$5.883
Zoo Text-to-CAD0.6600.9470.0005.2s$0.213
Claude Sonnet 4.6 → CadQuery0.6020.9360.00016.6s$0.059
Gemini 2.5 Flash → CadQuery0.5800.9320.0009.2s$0.016
GPT-5 → CadQuery0.5110.9340.00041.5s$0.202
CAD-Coder R10.4920.921×0.0004.6s$0.006
Claude Opus 4.7 → CadQuery0.4780.9280.00028.3s$0.388
Gemini 2.5 Pro → OpenSCAD0.4730.918×0.00033.3s$0.077
OpenAI o4 (reasoning) → CadQuery0.4480.918×0.00093.4s$1.166
Claude Opus 4.7 → OpenSCAD0.443×0.912×0.00030.2s$0.324
DeepSeek R1 (reasoning) → CadQuery0.435×0.912×0.00085.1s$0.045
Qwen3 Coder → CadQuery0.4230.916×0.00021.9s$0.033
Adam (CADcrush)0.4210.917×0.0007.5s$0.290
GPT-5 Mini → OpenSCAD0.4150.914×0.00012.4s$0.011
Claude Haiku 4.5 → CadQuery0.401×0.907×0.0008.4s$0.017
Llama 3.3 70B → OpenSCAD0.378×0.911×0.00015.8s$0.020
DeepCAD0.304×0.896×0.0005.3s$0.017
Hunyuan3D-20.124×0.870×0.00033.2s$0.076
Trellis 3D0.108×0.867×0.0009.9s$0.044
Spline AI0.100×0.865×0.0008.6s$0.035