CAD-Bench
← back
CAM-005 · CAM Toolpath Validity · difficulty 4/5

T-slot pocket array (Ø 8 + Ø 4 endmills)

sha256:be01fc02d3a09e11

§1Prompt verbatim

Plate 80 × 80 × 14 mm with three parallel T-slots (top width 12 mm, bottom width 16 mm, total depth 10 mm, length 60 mm) on 25 mm pitch. The T-slot bottom must be machinable with a Ø 4 mm endmill on a 3-axis VMC (i.e. all undercuts reachable).

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.05 mm
featurest_slot_x3

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.6907
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.65010
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.6348
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.5909
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.56511
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.50313
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.48813
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.46312
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.42512
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.4110
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.39515
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.36915
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.3530
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.35013
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.33117
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.27219
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.24821
Trellis 3D
Trellis 3D
Microsoft Research
0.0520
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.014103
Spline AI
Spline AI
Spline.design
0.0000

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentWatert.Manif.FeatRecCAM ReachabilityP@1p50latencycost
Human Baseline (Mech-E)0.9610.8510.8390.000778.3s$6.309
OpenAI o4 (reasoning) → CadQuery0.9460.7490.6420.000119.2s$1.185
Claude Sonnet 4.6 → CadQuery0.9410.7200.6090.00023.1s$0.062
Zoo Text-to-CAD0.9330.7350.6850.0006.1s$0.173
Claude Opus 4.7 → CadQuery0.9370.7070.6680.00032.3s$0.401
Adam (CADcrush)0.9310.7210.6560.0009.2s$0.283
Gemini 2.5 Flash → CadQuery0.9230.6090.5180.00014.2s$0.021
GPT-5 → CadQuery0.9230.7350.6800.00044.4s$0.191
Claude Haiku 4.5 → CadQuery×0.9100.4980.4200.0009.2s$0.018
Claude Opus 4.7 → OpenSCAD×0.9110.5480.5290.00023.9s$0.260
Qwen3 Coder → CadQuery×0.9070.6370.5430.00016.5s$0.033
DeepSeek R1 (reasoning) → CadQuery×0.9030.6100.5850.00089.4s$0.043
Gemini 2.5 Pro → OpenSCAD×0.9070.5310.4830.00034.7s$0.079
CAD-Coder R1×0.9040.7020.5170.0007.3s$0.006
Llama 3.3 70B → OpenSCAD×0.8970.4420.3510.00018.1s$0.022
GPT-5 Mini → OpenSCAD×0.8900.3850.3760.00016.0s$0.011
DeepCAD×0.8900.4760.3550.0003.4s$0.022
Trellis 3D×0.8570.1940.0880.00013.7s$0.050
Hunyuan3D-2×0.8520.2170.0950.00027.5s$0.073
Spline AI×0.8500.0930.0490.00010.1s$0.034