BOOL-005 · Boolean Robustness · difficulty 5/5

ε-offset extrusion (sliver-face stress)

sha256:7a1cb09001ddf0f4…

§1Prompt verbatim

Cube 30 × 30 × 30 mm. Subtract from it a second cube of the same size, translated by (+0.005, +0.005, 0) mm. The result must be one watertight body — kernels must not leave a 5-µm sliver shell.

§2Ground-truth spec

shells1

watertighttrue

manifoldtrue

acceptance ε±0.001 mm

5 µm offset is below most kernels' default ε. Tasks at this scale separate ACIS-grade kernels from naive CSG.

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP

vol IoU · BREP · manifold

canonical reference

REFERENCE

canonical · ground truth

1.000100✓

Human Baseline (Mech-E)

Claude Sonnet 4.6 → CadQuery

Anthropic + CadQuery 2.4

0.60211✗

Gemini 2.5 Flash → CadQuery

Google + CadQuery 2.4

0.58010✗

GPT-5 → CadQuery

OpenAI + CadQuery 2.4

0.5119✗

CAD-Coder R1

CAD-Coder Labs (research)

0.49211✗

Claude Opus 4.7 → CadQuery

Anthropic + CadQuery 2.4

0.47811✗

Gemini 2.5 Pro → OpenSCAD

Google + OpenSCAD 2024.06

0.4730✗

OpenAI o4 (reasoning) → CadQuery

OpenAI + CadQuery 2.4

0.44812✗

Claude Opus 4.7 → OpenSCAD

Anthropic + OpenSCAD 2024.06

0.4430✗

DeepSeek R1 (reasoning) → CadQuery

DeepSeek + CadQuery 2.4

0.43513✗

Qwen3 Coder → CadQuery

Alibaba + CadQuery 2.4

GPT-5 Mini → OpenSCAD

OpenAI + OpenSCAD 2024.06

0.41514✗

Claude Haiku 4.5 → CadQuery

Anthropic + CadQuery 2.4

0.40112✗

Llama 3.3 70B → OpenSCAD

Meta + OpenSCAD 2024.06

0.37814✗

DeepCAD

Wu et al. 2021 (research)

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task

Agent	Vol IoU	Watert.	Manif.	Euler-Poincaré Compliance	P@1	p50	latency	cost
Human Baseline (Mech-E)	0.795	✓	0.966	✓	1.000	—	900.2s	$5.883
Zoo Text-to-CAD	0.660	✓	0.947	✓	0.000	—	5.2s	$0.213
Claude Sonnet 4.6 → CadQuery	0.602	✓	0.936	✓	0.000	—	16.6s	$0.059
Gemini 2.5 Flash → CadQuery	0.580	✓	0.932	✓	0.000	—	9.2s	$0.016
GPT-5 → CadQuery	0.511	✓	0.934	✓	0.000	—	41.5s	$0.202
CAD-Coder R1	0.492	✓	0.921	×	0.000	—	4.6s	$0.006
Claude Opus 4.7 → CadQuery	0.478	✓	0.928	✓	0.000	—	28.3s	$0.388
Gemini 2.5 Pro → OpenSCAD	0.473	✓	0.918	×	0.000	—	33.3s	$0.077
OpenAI o4 (reasoning) → CadQuery	0.448	✓	0.918	×	0.000	—	93.4s	$1.166
Claude Opus 4.7 → OpenSCAD	0.443	×	0.912	×	0.000	—	30.2s	$0.324
DeepSeek R1 (reasoning) → CadQuery	0.435	×	0.912	×	0.000	—	85.1s	$0.045
Qwen3 Coder → CadQuery	0.423	✓	0.916	×	0.000	—	21.9s	$0.033
Adam (CADcrush)	0.421	✓	0.917	×	0.000	—	7.5s	$0.290
GPT-5 Mini → OpenSCAD	0.415	✓	0.914	×	0.000	—	12.4s	$0.011
Claude Haiku 4.5 → CadQuery	0.401	×	0.907	×	0.000	—	8.4s	$0.017
Llama 3.3 70B → OpenSCAD	0.378	×	0.911	×	0.000	—	15.8s	$0.020
DeepCAD	0.304	×	0.896	×	0.000	—	5.3s	$0.017
Hunyuan3D-2	0.124	×	0.870	×	0.000	—	33.2s	$0.076
Trellis 3D	0.108	×	0.867	×	0.000	—	9.9s	$0.044
Spline AI	0.100	×	0.865	×	0.000	—	8.6s	$0.035