Preprint · cad-bench/v0.5 · sweep 2026-04-12open · MIT
CAD·Benchv0.5
← all tasks
CAL-003 · Confidence Calibration · difficulty 5/5

Confidence-calibrated planetary carrier

sha256:2b97cc4d1ef0aa55

§1Prompt verbatim

Build the planetary carrier of MECH-027 and report a self-assessed pre-generation confidence ∈ [0,1] in your output's correctness against the spec.

§2Ground-truth spec

shells1
watertighttrue
manifoldtrue
acceptance ε±0.05 mm

§3Reference render

canonical reference · drag to orbit, scroll to zoom

Visualisation is rebuilt in-browser from the canonical parametric description. Scoring is performed against the held-out reference STEP file (sha-256 fingerprint above).

§4Per-agent renders

reference + 10 agent outputs · scored against the held-out STEP
vol IoU · BREP · manifold

Each tile is rebuilt from the canonical parametric description and degraded to match the agent's scored profile (tessellation, non-manifold face removal, dimension scale jitter, missing features). Image-only diffusion models render visually plausible meshes but score in the single digits on BREP fidelity — the geometry is not a manifold solid even when the render reads clean.

§5Per-agent metrics

ranked by Vol IoU · same data as the leaderboard, restricted to this task
AgentWatert.Manif.Calibration (Brier)P@1p50latencycost
Human Baseline (Mech-E)0.9490.0540.000677.8s$5.677
Claude Opus 4.7 → CadQuery×0.9130.1050.00032.4s$0.317
Adam (CADcrush)×0.9060.1290.0008.3s$0.275
Claude Opus 4.7 → OpenSCAD×0.9040.1050.00024.2s$0.278
Zoo Text-to-CAD×0.8960.1060.0005.8s$0.158
GPT-5 → CadQuery×0.8980.1530.00051.1s$0.184
Gemini 2.5 Pro → OpenSCAD×0.8980.1510.00019.8s$0.077
DeepCAD×0.8660.2240.0003.5s$0.024
Trellis 3D
kernel error: BRepCheck_NotClosed
×0.0000.0008.6s$0.050
Spline AI×0.8500.2810.0006.1s$0.044