Preprint · cad-bench/v0.5 · sweep 2026-04-12open · MIT
CAD·Benchv0.5
← all categories

Confidence Calibration

For agents that report a pre-generation confidence ∈ [0, 1], we score the Brier loss against the realized Pass@1. Agents that don't expose a confidence channel are assigned the constant prior (their global Pass@1 rate); this becomes their effective baseline.

Calibration (Brier) · score · Pass@1 · ratio ·

RANKED AGENTS · 95 % CI

#AgentScore
1Human Baseline (Mech-E)
44.6
[44.6, 44.6] · n=1
2Claude Opus 4.7 → CadQuery
39.5
[39.5, 39.5] · n=1
3Claude Opus 4.7 → OpenSCAD
39.5
[39.5, 39.5] · n=1
4Zoo Text-to-CAD
39.4
[39.4, 39.4] · n=1
5Adam (CADcrush)
37.1
[37.1, 37.1] · n=1
6Gemini 2.5 Pro → OpenSCAD
34.9
[34.9, 34.9] · n=1
7GPT-5 → CadQuery
34.7
[34.7, 34.7] · n=1
8DeepCAD
27.6
[27.6, 27.6] · n=1
9Spline AI
21.9
[21.9, 21.9] · n=1
10Trellis 3D
0.0
[0.0, 0.0] · n=1

TASKS IN THIS CATEGORY