20 CAD-generation systems under test
Four mechanical engineers (median 9 yrs CAD experience) modelled the same prompts in Onshape. Wall-clock time and tool cost ($/seat·hr) are recorded. Scores are inter-rater averaged.
Native BREP generator. Outputs valid AP242 STEP. Trained on the Zoo internal corpus + filtered GrabCAD. Endpoint: text-to-cad.zoo.dev/api.
Reasoning model with private chain-of-thought. Self-repair often unnecessary — single-shot pass-rate is ~14 pp higher than GPT-5 on PARAM-* tasks at the cost of 3-4× wall-clock latency.
Closed-beta natural-language modeller; emits parametric Onshape FeatureScript export. Tested through partner key (rate-limited 60 req/h).
Same scaffold and self-repair budget as the Opus 4.7 pipeline. About 5× cheaper at the cost of ~6 IoU points on hard parametric tasks. Best Pareto candidate for high-throughput sweeps.
Same scaffold as the Claude pipeline for fair comparison. Self-repair budget capped at 3 attempts.
Few-shot scaffold (8 exemplars from the OCC tutorial set), self-repair loop with up to 3 OCC error feedbacks. Executes in a Vercel Sandbox per call.
Open-weight reasoning baseline. Self-hosted on a single H100; results below assume bf16 with vLLM. Best public open-weight on PARAM tasks; lags closed-source on GD&T.
Specialty model fine-tuned on a 1.2 M synthetic CadQuery corpus from the ABC dataset. Punches well above its weight on primitives and brep_fidelity at 7B params; falls off on functional_intent (no FEA training signal).
Code-specialized open-weight baseline. Strong at translating prompts into syntactically clean CadQuery, weaker at engineering judgement (e.g. picking sensible drafts).
Cheaper alternative to Gemini Pro on the CadQuery scaffold. Within 4 pp of the Pro variant on geometry but visibly worse on multi-feature parts where attention-budget matters.
Same prompt template as the Gemini pipeline. Output is mesh-only.
Mesh-only output (OpenSCAD does not produce BREP); STEP round-trip therefore disabled. CSG kernel: CGAL.
Targets the hobbyist envelope: cheap, mesh-only, OpenSCAD CSG. Surprisingly competent on primitives; collapses on standards-compliance and reverse-engineering.
Cost-floor entry. Still passes most L1 primitive tasks but degrades sharply on GD&T and standards. Useful as a 'can a small model do this at all?' canary.
Transformer over CAD command sequences (extrude, revolve, sketch). Limited prompt vocabulary; we wrap with a Claude-3.5-mini paraphraser to convert natural prompts into the in-distribution token grammar.
Open-weight non-reasoning baseline. Treated as a sanity floor: an agent that scores below this is a regression for the field.
Diffusion model over structured latents. Outputs a mesh only; STEP round-trip and BREP-fidelity tasks score 0 by definition.
Diffusion 3D model, image- or text-conditioned. Output is a high-poly mesh; we run an automatic remesh + STL export. Excellent on freeform and aesthetic surfaces, near-zero on GD&T and standards.
Aimed at game/UX assets, not engineering CAD. Included as a non-CAD baseline to quantify the gap.