PARA-001 · Paraphrase Robustness · difficulty 3/5

5× paraphrased L-bracket

sha256:1f3a5e90c44b2210…

§1Prompt verbatim

Right-angle L-bracket, leg lengths 60 mm and 40 mm, thickness 5 mm. Through-hole Ø 6.6 mm centred 30 mm from the bend on the long leg.

§2Ground-truth spec

shells1

watertighttrue

manifoldtrue

acceptance ε±0.1 mm

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP

vol IoU · BREP · manifold

canonical reference

REFERENCE

canonical · ground truth

1.000100✓

Human Baseline (Mech-E)

n=4 senior engineers

0.7547✓

Claude Opus 4.7 → CadQuery

Anthropic + CadQuery 2.4

Claude Sonnet 4.6 → CadQuery

Anthropic + CadQuery 2.4

0.71610✓

OpenAI o4 (reasoning) → CadQuery

OpenAI + CadQuery 2.4

0.69610✓

Qwen3 Coder → CadQuery

Alibaba + CadQuery 2.4

0.63110✗

Gemini 2.5 Flash → CadQuery

Google + CadQuery 2.4

0.63011✗

Gemini 2.5 Pro → OpenSCAD

Google + OpenSCAD 2024.06

0.6260✓

Claude Opus 4.7 → OpenSCAD

Anthropic + OpenSCAD 2024.06

OpenAI + CadQuery 2.4

0.55611✗

DeepSeek R1 (reasoning) → CadQuery

DeepSeek + CadQuery 2.4

0.55310✗

CAD-Coder R1

CAD-Coder Labs (research)

0.52013✗

Claude Haiku 4.5 → CadQuery

Anthropic + CadQuery 2.4

GPT-5 Mini → OpenSCAD

OpenAI + OpenSCAD 2024.06

0.46011✗

Llama 3.3 70B → OpenSCAD

Meta + OpenSCAD 2024.06

0.43610✗

DeepCAD

Wu et al. 2021 (research)

0.22623✗

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task

Agent	Watert.	Manif.	Paraphrase IoU σ	Seed σ	P@1	p50	latency	cost
Human Baseline (Mech-E)	✓	0.970	0.005	0.005	1.000	—	878.3s	$7.003
Claude Opus 4.7 → CadQuery	✓	0.956	0.021	0.051	0.000	—	28.9s	$0.343
Zoo Text-to-CAD	✓	0.966	0.035	0.022	0.000	—	4.6s	$0.207
Claude Sonnet 4.6 → CadQuery	✓	0.963	0.022	0.030	0.000	—	13.6s	$0.063
OpenAI o4 (reasoning) → CadQuery	✓	0.963	0.031	0.037	0.000	—	117.7s	$1.157
Qwen3 Coder → CadQuery	✓	0.947	0.046	0.066	0.000	—	13.4s	$0.029
Gemini 2.5 Flash → CadQuery	✓	0.943	0.044	0.034	0.000	—	9.5s	$0.023
Gemini 2.5 Pro → OpenSCAD	✓	0.952	0.051	0.056	0.000	—	22.7s	$0.092
Claude Opus 4.7 → OpenSCAD	✓	0.947	0.025	0.050	0.000	—	31.4s	$0.364
Hunyuan3D-2	✓	0.937	0.047	0.101	0.000	—	30.3s	$0.061
Adam (CADcrush)	✓	0.940	0.035	0.023	0.000	—	10.6s	$0.225
GPT-5 → CadQuery	✓	0.941	0.027	0.050	0.000	—	35.8s	$0.222
DeepSeek R1 (reasoning) → CadQuery	✓	0.928	0.038	0.061	0.000	—	76.0s	$0.044
CAD-Coder R1	✓	0.925	0.040	0.024	0.000	—	6.7s	$0.005
Claude Haiku 4.5 → CadQuery	✓	0.926	0.068	0.066	0.000	—	6.2s	$0.017
Spline AI	✓	0.926	0.069	0.131	0.000	—	9.6s	$0.042
Trellis 3D	✓	0.924	0.051	0.085	0.000	—	9.4s	$0.053
GPT-5 Mini → OpenSCAD	✓	0.921	0.033	0.077	0.000	—	14.8s	$0.008
Llama 3.3 70B → OpenSCAD	✓	0.921	0.081	0.068	0.000	—	22.5s	$0.018
DeepCAD	×	0.886	0.092	0.045	0.000	—	4.7s	$0.019