CAD-Bench
← back
SEAL-004 · Sealing-Groove Design · difficulty 4/5

AS568-218 piston-type radial groove

sha256:ae0bf01dc20a91ee

§1Prompt verbatim

Piston-side radial groove for an AS568-218 O-ring (Ø 1.484 in × 0.139 in cross-section). Sealed pressure 21 MPa hydraulic, dynamic. Apply 12-17 % squeeze, 60-85 % groove fill. Output the piston with the groove cut into its OD.

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.05 mm
featuresgroove_AS568-218_piston

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.8108
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.6987
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.5939
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.5908
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.56010
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.54010
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.49913
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.45210
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.43813
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.4100
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.39315
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.35516
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.3330
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.27219
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.023102
no manifold solid produced
Adam (CADcrush)
Adam (CADcrush)
CADcrush
79
Trellis 3D
Trellis 3D
Microsoft Research
0.0000
no manifold solid produced
Spline AI
Spline AI
Spline.design
2
no manifold solid produced
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
50
no manifold solid produced
Hunyuan3D-2
Hunyuan3D-2
Tencent
5

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentWatert.Manif.Named-Dimension RMSEStandards ComplianceP@1p50latencycost
Human Baseline (Mech-E)0.9730.0670.8491.000667.6s$6.361
Zoo Text-to-CAD0.9540.1750.7040.0005.7s$0.167
Claude Sonnet 4.6 → CadQuery0.9470.2190.6280.00016.0s$0.059
OpenAI o4 (reasoning) → CadQuery0.9460.2410.6780.00081.1s$1.166
GPT-5 → CadQuery0.9390.2030.6530.00035.4s$0.249
DeepSeek R1 (reasoning) → CadQuery0.9280.2770.6100.00079.4s$0.041
Claude Opus 4.7 → CadQuery0.9290.2630.6640.00034.7s$0.284
Gemini 2.5 Flash → CadQuery0.9230.2800.5890.00014.3s$0.024
CAD-Coder R1×0.9120.3190.5250.0006.4s$0.006
Claude Opus 4.7 → OpenSCAD×0.9090.2470.5580.00037.6s$0.341
Qwen3 Coder → CadQuery×0.9110.3170.5140.00017.4s$0.025
Llama 3.3 70B → OpenSCAD×0.9010.3150.3920.00024.4s$0.024
Gemini 2.5 Pro → OpenSCAD×0.9000.3140.5050.00028.3s$0.074
GPT-5 Mini → OpenSCAD×0.8890.2900.3440.00013.6s$0.009
DeepCAD×0.8540.3490.2700.0005.5s$0.018
Adam (CADcrush)
kernel error: BRepCheck_NotClosed
×0.0000.00010.8s$0.231
Trellis 3D×0.8500.4920.0490.00013.4s$0.059
Spline AI
kernel error: BRepCheck_NotClosed
×0.0000.00010.0s$0.042
Claude Haiku 4.5 → CadQuery
kernel error: BRepCheck_NotClosed
×0.0000.0008.6s$0.018
Hunyuan3D-2
kernel error: BRepCheck_NotClosed
×0.0000.00029.3s$0.068