REVENG-002 · Reverse Engineering · difficulty 4/5

Three-view ortho → bracket

sha256:30d72b41ae9c0014…

§1Prompt verbatim

From the supplied 1:1 front/top/side dimensioned drawing (PNG, 600 dpi) reproduce the part. All dimensions and tolerances on the drawing are authoritative.

§2Ground-truth spec

shells1

watertighttrue

manifoldtrue

acceptance ε±0.1 mm

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP

vol IoU · BREP · manifold

canonical reference

REFERENCE

canonical · ground truth

1.000100✓

Human Baseline (Mech-E)

Claude Sonnet 4.6 → CadQuery

Anthropic + CadQuery 2.4

0.5489✗

GPT-5 → CadQuery

OpenAI + CadQuery 2.4

OpenAI o4 (reasoning) → CadQuery

OpenAI + CadQuery 2.4

0.51610✗

Gemini 2.5 Pro → OpenSCAD

Google + OpenSCAD 2024.06

0.4690✗

Gemini 2.5 Flash → CadQuery

Google + CadQuery 2.4

0.43714✗

Claude Opus 4.7 → CadQuery

Anthropic + CadQuery 2.4

0.41613✗

Claude Opus 4.7 → OpenSCAD

Anthropic + OpenSCAD 2024.06

DeepSeek R1 (reasoning) → CadQuery

DeepSeek + CadQuery 2.4

0.37915✗

CAD-Coder R1

CAD-Coder Labs (research)

0.35917✗

GPT-5 Mini → OpenSCAD

OpenAI + OpenSCAD 2024.06

0.34315✗

Claude Haiku 4.5 → CadQuery

Anthropic + CadQuery 2.4

0.33418✗

Qwen3 Coder → CadQuery

Alibaba + CadQuery 2.4

Llama 3.3 70B → OpenSCAD

Meta + OpenSCAD 2024.06

Wu et al. 2021 (research)

0.09052✗

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task

Agent	Vol IoU	Watert.	Manif.	Named-Dimension RMSE	FeatRec	p50	latency	cost
Human Baseline (Mech-E)	0.654	✓	0.945	0.145	0.956	—	806.4s	$5.561
Hunyuan3D-2	0.573	✓	0.932	0.551	0.212	—	25.4s	$0.078
Claude Sonnet 4.6 → CadQuery	0.548	✓	0.933	0.226	0.711	—	15.5s	$0.075
GPT-5 → CadQuery	0.525	✓	0.933	0.213	0.716	—	40.9s	$0.173
Trellis 3D	0.521	✓	0.933	0.557	0.204	—	15.6s	$0.047
OpenAI o4 (reasoning) → CadQuery	0.516	✓	0.933	0.181	0.783	—	98.7s	$1.293
Gemini 2.5 Pro → OpenSCAD	0.469	✓	0.923	0.221	0.524	—	30.1s	$0.080
Gemini 2.5 Flash → CadQuery	0.437	✓	0.919	0.270	0.577	—	13.3s	$0.023
Claude Opus 4.7 → CadQuery	0.416	✓	0.916	0.205	0.754	—	27.2s	$0.337
Claude Opus 4.7 → OpenSCAD	0.411	✓	0.917	0.253	0.533	—	33.6s	$0.266
Zoo Text-to-CAD	0.392	×	0.906	0.201	0.738	—	5.6s	$0.188
DeepSeek R1 (reasoning) → CadQuery	0.379	×	0.908	0.261	0.613	—	85.8s	$0.044
CAD-Coder R1	0.359	×	0.903	0.293	0.650	—	4.5s	$0.004
GPT-5 Mini → OpenSCAD	0.343	×	0.903	0.313	0.386	—	10.0s	$0.009
Claude Haiku 4.5 → CadQuery	0.334	×	0.900	0.340	0.536	—	5.8s	$0.016
Qwen3 Coder → CadQuery	0.330	×	0.902	0.303	0.629	—	16.9s	$0.024
Adam (CADcrush)	0.306	×	0.900	0.170	0.730	—	7.0s	$0.266
Llama 3.3 70B → OpenSCAD	0.235	×	0.886	0.322	0.425	—	18.6s	$0.022
Spline AI	0.224	×	0.886	0.556	0.100	—	7.1s	$0.048
DeepCAD	0.090	×	0.865	0.333	0.435	—	4.3s	$0.021