Preprint · cad-bench/v0.5 · sweep 2026-04-12open · MIT
CAD·Benchv0.5
AGENTS

10 CAD-generation systems under test

Human Baseline (Mech-E) Tier S
n=4 senior engineers · Onshape-2026-04
86.6θ 100
Human·BREP·open·p5 82

Four mechanical engineers (median 9 yrs CAD experience) modelled the same prompts in Onshape. Wall-clock time and tool cost ($/seat·hr) are recorded. Scores are inter-rater averaged.

Claude Opus 4.7 → CadQuery Tier A
Anthropic + CadQuery 2.4 · opus-4.7 + cadquery 2.4
72.5θ 98
LLM+CadQuery·CadQuery·proprietary·p5 58

Few-shot scaffold (8 exemplars from the OCC tutorial set), self-repair loop with up to 3 OCC error feedbacks. Executes in a Vercel Sandbox per call.

Zoo Text-to-CAD Tier A
Zoo (KittyCAD) · 2.4
71.6θ 90
API·BREP·proprietary·p5 39

Native BREP generator. Outputs valid AP242 STEP. Trained on the Zoo internal corpus + filtered GrabCAD. Endpoint: text-to-cad.zoo.dev/api.

Adam (CADcrush) Tier A
CADcrush · 1.1
69.2θ 98
API·BREP·proprietary·p5 56

Closed-beta natural-language modeller; emits parametric Onshape FeatureScript export. Tested through partner key (rate-limited 60 req/h).

GPT-5 → CadQuery Tier A
OpenAI + CadQuery 2.4 · gpt-5 + cadquery 2.4
67.1θ 77
LLM+CadQuery·CadQuery·proprietary·p5 53

Same scaffold as the Claude pipeline for fair comparison. Self-repair budget capped at 3 attempts.

Gemini 2.5 Pro → OpenSCAD Tier B
Google + OpenSCAD 2024.06 · 2.5-pro + openscad 2024.06
54.6θ 8
LLM+OpenSCAD·OpenSCAD·proprietary·p5 33

Mesh-only output (OpenSCAD does not produce BREP); STEP round-trip therefore disabled. CSG kernel: CGAL.

Claude Opus 4.7 → OpenSCAD Tier B
Anthropic + OpenSCAD 2024.06 · opus-4.7 + openscad 2024.06
51.6θ 8
LLM+OpenSCAD·OpenSCAD·proprietary·p5 0

Same prompt template as the Gemini pipeline. Output is mesh-only.

DeepCAD Tier C
Wu et al. 2021 (research) · official checkpoint, retrained 2024-11
42.1θ 14
Diffusion-3D·BREP·research·p5 0

Transformer over CAD command sequences (extrude, revolve, sketch). Limited prompt vocabulary; we wrap with a Claude-3.5-mini paraphraser to convert natural prompts into the in-distribution token grammar.

Trellis 3D Tier C
Microsoft Research · 1.0 (image-to-3D)
25.4θ 0
Diffusion-3D·Mesh·open·p5 0

Diffusion model over structured latents. Outputs a mesh only; STEP round-trip and BREP-fidelity tasks score 0 by definition.

Spline AI Tier D
Spline.design · 2.7
17.0θ 0
Diffusion-3D·Mesh·proprietary·p5 0

Aimed at game/UX assets, not engineering CAD. Included as a non-CAD baseline to quantify the gap.