// METHODOLOGY · PILLAR 01
Calibrated rubric design.
The difference between a rubric and a calibrated rubric is the same as the difference between a ruler with the right markings and a ruler that has actually been compared to a standard meter.
Calibrated rubric design — A calibrated rubric is a scoring instrument whose scale itself has been calibrated — not just the wording — so each score maps to a defined, defensible performance state. Versioned, owned, and anchored to published task standards. The instrument behind every defensible assessment.
// 01 — WHAT MAKES A RUBRIC CALIBRATED
Three properties.
Anchored. Every score on the scale corresponds to a defined performance state, tied to the operator's published task standard. Not 'good / better / best' — 'meets task standard with no margin / meets task standard with margin / exceeds task standard in measurable ways.'
Versioned. Every edit to the rubric produces a new version. Historical scores remain attached to the version that produced them. A rubric that loses version history loses calibration the moment it changes.
Owned. A rubric without a named owner drifts. Ownership means a specific person is accountable for the calibration of the instrument. Without that, the rubric is a Google doc.
Anchored. Versioned. Owned. Three properties. Without all three, calibration is a claim, not a state.
// 02 — RUBRIC DESIGN UNDER OCTAAR
The substrate around the rubric.
OCTAAR does not impose a rubric. The operator's published task standards become the calibrated scale. What OCTAAR provides is the substrate: rubric authorship, version control, ownership assignment, calibration sessions, and the linkage from each score back to the rubric definition that produced it.
The substrate is what makes the rubric durable. The rubric without the substrate is a Word file that will drift inside three quarters.
// 03 — WHAT NOT TO DO
Three failure modes in rubric design.
Failure mode one: the rubric scale describes effort instead of outcome. 'Demonstrated significant effort to meet the standard' is not a calibrated score. The standard either was met or was not.
Failure mode two: the rubric anchors to ambiguous language. 'Adequate,' 'satisfactory,' 'effective' — each invites observer interpretation. Calibrated language anchors to observable behavior or measurable result.
Failure mode three: the rubric is too long. A 40-item rubric administered under field conditions degrades into observer summary judgment. The calibrated short rubric beats the comprehensive long one.
// 04 — RUBRIC CALIBRATION SESSIONS
How the instrument gets calibrated.
Pre-cycle calibration session: evaluators review recorded or live performances, score them, compare scores, discuss divergence, agree on anchored interpretation. The output is shared understanding of what the rubric means on this cycle.
Mid-cycle calibration check: a small subset of in-cycle observations are second-scored. Variance is surfaced. Out-of-tolerance scoring is flagged.
Post-cycle calibration review: aggregate IRR is examined. Drift events at the evaluator-pool level become findings. The rubric itself can become a finding — if scoring divergence concentrates on a specific item, the item may need recalibration or revision.
// READ NEXT
// Last updated · · OCTAAR Methodology Team
// FAQ
Direct answers.
How many scoring levels should a rubric have?
Can we use a rubric we already wrote?
Who owns a rubric inside OCTAAR?
What happens to old scores when the rubric is revised?
// READY