20 categories across 4 layers

Volumetric IoUBidirectional ChamferWatertightnessEdge-Manifoldness

Closed-form parametric primitives (boxes, cylinders, cones, tori, regular prisms) at exactly specified dimensions. Establishes a noise floor: agents that fail here cannot be trusted on harder tasks.

Boolean Robustness

w=0.20 · n=18

Volumetric IoUEdge-ManifoldnessEuler-Poincaré ComplianceWatertightness

Edge-case CSG operations: tangent fillets, coplanar faces, near-degenerate intersections, high-genus subtractions. Stresses kernel ε-tolerance handling. Patterned after the OpenCascade and ACIS robustness suites.

BREP Fidelity

w=0.40 · n=14

STEP Round-trip ChamferEdge-ManifoldnessEuler-Poincaré ComplianceFeature Recall

Tests whether the agent emits a clean boundary representation (named faces, coherent edge graph, exact NURBS surfaces) versus a tessellated approximation. Round-trips through AP242 STEP.

Free-form Surfaces

w=0.30 · n=12

Bidirectional ChamferHausdorff p95Normal Consistency

Class-A surfaces (G2 continuity, lofted, swept) such as turbine blades and ergonomic handles. Scored against high-density (200 k vertex) ground-truth meshes.

L2 · Engineering

w_layer = 0.35 · 6 categories

Named-dimension matching, GD&T parsers, partner-part assembly simulation. Requires CAD-kernel reasoning about engineering intent.

Parametric Mechanical Parts

w=0.25 · n=30

Named-Dimension RMSEGD&T ComplianceFeature Recall

Industry-grade brackets, fasteners, housings, and shafts specified with full GD&T (ISO 1101) including position, parallelism, and concentricity callouts. Volumes range 0.5 cm³ – 1.2 dm³.

Assembly & Mating

w=0.22 · n=16

Mating ClearanceFit-Class ComplianceFeature Recall

Two- and three-body assemblies with pin-in-hole, dovetail, and threaded mate constraints. Scoring requires the candidate to mate the held-out reference partner within the prescribed clearance band.

Standards Compliance

w=0.15 · n=18

Standards ComplianceGD&T Compliance

Prompts cite a specific standard (ISO 4762 socket-head cap screw, DIN 471 retaining ring, AS568 O-ring, ANSI B5.50 dovetail). Score = fraction of standard-derived feature parameters matched within the standard's tolerance band.

Sheet-Metal Bodies

w=0.13 · n=14

Wall-Thickness UniformityNamed-Dimension RMSEFeature Recall

Uniform-thickness bodies with bend specifications, k-factors, relief cuts, and unfoldable flat patterns. Tested by attempting to unfold the result and measuring the unfold error vs the spec'd flat pattern.

Sealing-Groove Design

w=0.10 · n=10

Standards ComplianceNamed-Dimension RMSE

O-ring grooves (AS568 / ISO 3601), face seals, and lip-seal cavities. Score depends on cross-section area, groove width, and squeeze ratio matching the relevant standard.

Kinematic Mechanisms

w=0.15 · n=12

Mating ClearanceFeature RecallParametric Edit Accuracy

Four-bar linkages, cams (radial/face), gear meshes (involute, ISO 53). Scored by simulating one full kinematic cycle and measuring (a) feasibility — no body interpenetration — and (b) prescribed motion error.

L3 · Manufacturing

w_layer = 0.25 · 4 categories

Process-specific analyzers — DFM rules, CAM postprocessor, draft / wall / overhang checks. Different per process.

DFM · 3-Axis CNC

w=0.30 · n=18

CAM ReachabilityMin-Wall ComplianceDFM Composite

Manufacturable on a 3-axis VMC with a Ø6 → Ø3 → Ø1 tool stack. Score gates on tool reachability (no closed pockets, no internal corners <tool radius), fixturable orientation, and workholding access.

DFM · Injection Mould

w=0.30 · n=20

Draft-Angle ComplianceWall-Thickness UniformityMin-Wall Compliance

Two-plate mould tooling: parting plane, ≥1° draft on every vertical face, no closed voids, uniform wall thickness ±10 %, gate/runner accessibility. Tested by an automatic draft + thickness analyzer plus parting-line extraction.

DFM · FDM 3D Print

w=0.20 · n=14

Support-Volume RatioMin-Wall ComplianceDFM Composite

FDM-printable on a 0.4 mm nozzle: overhangs ≤45° from build plate, ≥0.8 mm walls, no enclosed cavities, optimal orientation auto-selected by the analyzer.

CAM Toolpath Validity

w=0.20 · n=12

CAM ReachabilityFeature Recall

Stricter cousin of DFM-CNC: an actual 3-axis CAM postprocessor (FreeCAD-Path) must generate a collision-free G-code program at a ≤0.05 mm finish stepover. Score = fraction of part surface produced.

L4 · Cognition

w_layer = 0.20 · 6 categories

Robustness and intent. Paraphrase variance, parametric range survival, FEA-pass at spec'd loads, calibration of self-reported confidence.

Constraint Solving & Editability

w=0.18 · n=18

judging: variance-analysis

Probes whether the agent exposes a working parametric graph: after the part is built we issue downstream parameter edits (length+30 %, hole diameter→M8) and re-evaluate without topological breakage.

Parametric Edit AccuracyParametric Range IntegrityConstraint Solve Rate

Reverse Engineering

w=0.20 · n=22

Volumetric IoUFeature RecallNamed-Dimension RMSE

Multi-view orthographic drawings (front/top/side at 1:1, fully dimensioned) and product photos. The agent must reproduce the part. Adapted from the ABC dataset and a held-out subset of GrabCAD test parts.

2-D Sketch Constraints

w=0.10 · n=20