CAD-Bench
← back
REVENG-002 · Reverse Engineering · difficulty 4/5

Three-view ortho → bracket

sha256:30d72b41ae9c0014

§1Prompt verbatim

From the supplied 1:1 front/top/side dimensioned drawing (PNG, 600 dpi) reproduce the part. All dimensions and tolerances on the drawing are authoritative.

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.1 mm

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold
canonical reference
REFERENCE
canonical · ground truth
1.000100
Human Baseline (Mech-E)
Human Baseline (Mech-E)
n=4 senior engineers
0.6548
Hunyuan3D-2
Hunyuan3D-2
Tencent
0.57311
Claude Sonnet 4.6 → CadQuery
Claude Sonnet 4.6 → CadQuery
Anthropic + CadQuery 2.4
0.5489
GPT-5 → CadQuery
GPT-5 → CadQuery
OpenAI + CadQuery 2.4
0.52510
Trellis 3D
Trellis 3D
Microsoft Research
0.5210
OpenAI o4 (reasoning) → CadQuery
OpenAI o4 (reasoning) → CadQuery
OpenAI + CadQuery 2.4
0.51610
Gemini 2.5 Pro → OpenSCAD
Gemini 2.5 Pro → OpenSCAD
Google + OpenSCAD 2024.06
0.4690
Gemini 2.5 Flash → CadQuery
Gemini 2.5 Flash → CadQuery
Google + CadQuery 2.4
0.43714
Claude Opus 4.7 → CadQuery
Claude Opus 4.7 → CadQuery
Anthropic + CadQuery 2.4
0.41613
Claude Opus 4.7 → OpenSCAD
Claude Opus 4.7 → OpenSCAD
Anthropic + OpenSCAD 2024.06
0.4110
Zoo Text-to-CAD
Zoo Text-to-CAD
Zoo (KittyCAD)
0.39215
DeepSeek R1 (reasoning) → CadQuery
DeepSeek R1 (reasoning) → CadQuery
DeepSeek + CadQuery 2.4
0.37915
CAD-Coder R1
CAD-Coder R1
CAD-Coder Labs (research)
0.35917
GPT-5 Mini → OpenSCAD
GPT-5 Mini → OpenSCAD
OpenAI + OpenSCAD 2024.06
0.34315
Claude Haiku 4.5 → CadQuery
Claude Haiku 4.5 → CadQuery
Anthropic + CadQuery 2.4
0.33418
Qwen3 Coder → CadQuery
Qwen3 Coder → CadQuery
Alibaba + CadQuery 2.4
0.33018
Adam (CADcrush)
Adam (CADcrush)
CADcrush
0.30619
Llama 3.3 70B → OpenSCAD
Llama 3.3 70B → OpenSCAD
Meta + OpenSCAD 2024.06
0.23520
Spline AI
Spline AI
Spline.design
0.2240
DeepCAD
DeepCAD
Wu et al. 2021 (research)
0.09052

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentVol IoUWatert.Manif.Named-Dimension RMSEFeatRecP@1p50latencycost
Human Baseline (Mech-E)0.6540.9450.1450.9560.000806.4s$5.561
Hunyuan3D-20.5730.9320.5510.2120.00025.4s$0.078
Claude Sonnet 4.6 → CadQuery0.5480.9330.2260.7110.00015.5s$0.075
GPT-5 → CadQuery0.5250.9330.2130.7160.00040.9s$0.173
Trellis 3D0.5210.9330.5570.2040.00015.6s$0.047
OpenAI o4 (reasoning) → CadQuery0.5160.9330.1810.7830.00098.7s$1.293
Gemini 2.5 Pro → OpenSCAD0.4690.9230.2210.5240.00030.1s$0.080
Gemini 2.5 Flash → CadQuery0.4370.9190.2700.5770.00013.3s$0.023
Claude Opus 4.7 → CadQuery0.4160.9160.2050.7540.00027.2s$0.337
Claude Opus 4.7 → OpenSCAD0.4110.9170.2530.5330.00033.6s$0.266
Zoo Text-to-CAD0.392×0.9060.2010.7380.0005.6s$0.188
DeepSeek R1 (reasoning) → CadQuery0.379×0.9080.2610.6130.00085.8s$0.044
CAD-Coder R10.359×0.9030.2930.6500.0004.5s$0.004
GPT-5 Mini → OpenSCAD0.343×0.9030.3130.3860.00010.0s$0.009
Claude Haiku 4.5 → CadQuery0.334×0.9000.3400.5360.0005.8s$0.016
Qwen3 Coder → CadQuery0.330×0.9020.3030.6290.00016.9s$0.024
Adam (CADcrush)0.306×0.9000.1700.7300.0007.0s$0.266
Llama 3.3 70B → OpenSCAD0.235×0.8860.3220.4250.00018.6s$0.022
Spline AI0.224×0.8860.5560.1000.0007.1s$0.048
DeepCAD0.090×0.8650.3330.4350.0004.3s$0.021