CAM-005 · CAM Toolpath Validity · difficulty 4/5

T-slot pocket array (Ø 8 + Ø 4 endmills)

sha256:be01fc02d3a09e11…

§1Prompt verbatim

Plate 80 × 80 × 14 mm with three parallel T-slots (top width 12 mm, bottom width 16 mm, total depth 10 mm, length 60 mm) on 25 mm pitch. The T-slot bottom must be machinable with a Ø 4 mm endmill on a 3-axis VMC (i.e. all undercuts reachable).

§2Ground-truth spec

shells1

watertighttrue

manifoldtrue

acceptance ε±0.05 mm

featurest_slot_x3

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP

vol IoU · BREP · manifold

canonical reference

REFERENCE

canonical · ground truth

1.000100✓

Human Baseline (Mech-E)

n=4 senior engineers

0.6907✓

OpenAI o4 (reasoning) → CadQuery

OpenAI + CadQuery 2.4

0.65010✗

Claude Sonnet 4.6 → CadQuery

Anthropic + CadQuery 2.4

Claude Opus 4.7 → CadQuery

Anthropic + CadQuery 2.4

Gemini 2.5 Flash → CadQuery

Google + CadQuery 2.4

0.48813✗

GPT-5 → CadQuery

OpenAI + CadQuery 2.4

0.46312✗

Claude Haiku 4.5 → CadQuery

Anthropic + CadQuery 2.4

0.42512✗

Claude Opus 4.7 → OpenSCAD

Anthropic + OpenSCAD 2024.06

0.4110✗

Qwen3 Coder → CadQuery

Alibaba + CadQuery 2.4

0.39515✗

DeepSeek R1 (reasoning) → CadQuery

DeepSeek + CadQuery 2.4

0.36915✗

Gemini 2.5 Pro → OpenSCAD

Google + OpenSCAD 2024.06

0.3530✗

CAD-Coder R1

CAD-Coder Labs (research)

0.35013✗

Llama 3.3 70B → OpenSCAD

Meta + OpenSCAD 2024.06

0.33117✗

GPT-5 Mini → OpenSCAD

OpenAI + OpenSCAD 2024.06

0.27219✗

DeepCAD

Wu et al. 2021 (research)

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task

Agent	Watert.	Manif.	FeatRec	CAM Reachability	p50	latency	cost
Human Baseline (Mech-E)	✓	0.961	0.851	0.839	—	778.3s	$6.309
OpenAI o4 (reasoning) → CadQuery	✓	0.946	0.749	0.642	—	119.2s	$1.185
Claude Sonnet 4.6 → CadQuery	✓	0.941	0.720	0.609	—	23.1s	$0.062
Zoo Text-to-CAD	✓	0.933	0.735	0.685	—	6.1s	$0.173
Claude Opus 4.7 → CadQuery	✓	0.937	0.707	0.668	—	32.3s	$0.401
Adam (CADcrush)	✓	0.931	0.721	0.656	—	9.2s	$0.283
Gemini 2.5 Flash → CadQuery	✓	0.923	0.609	0.518	—	14.2s	$0.021
GPT-5 → CadQuery	✓	0.923	0.735	0.680	—	44.4s	$0.191
Claude Haiku 4.5 → CadQuery	×	0.910	0.498	0.420	—	9.2s	$0.018
Claude Opus 4.7 → OpenSCAD	×	0.911	0.548	0.529	—	23.9s	$0.260
Qwen3 Coder → CadQuery	×	0.907	0.637	0.543	—	16.5s	$0.033
DeepSeek R1 (reasoning) → CadQuery	×	0.903	0.610	0.585	—	89.4s	$0.043
Gemini 2.5 Pro → OpenSCAD	×	0.907	0.531	0.483	—	34.7s	$0.079
CAD-Coder R1	×	0.904	0.702	0.517	—	7.3s	$0.006
Llama 3.3 70B → OpenSCAD	×	0.897	0.442	0.351	—	18.1s	$0.022
GPT-5 Mini → OpenSCAD	×	0.890	0.385	0.376	—	16.0s	$0.011
DeepCAD	×	0.890	0.476	0.355	—	3.4s	$0.022
Trellis 3D	×	0.857	0.194	0.088	—	13.7s	$0.050
Hunyuan3D-2	×	0.852	0.217	0.095	—	27.5s	$0.073
Spline AI	×	0.850	0.093	0.049	—	10.1s	$0.034