A performance review calibration meeting brings managers together to compare and align their performance ratings before those ratings are finalised and communicated to employees. The purpose is to ensure consistency, fairness, and a shared understanding of what each rating level means in practice. Without calibration, one manager's "exceeds expectations" might be another manager's "meets expectations," creating inequity across teams. The same principle of structured evaluation applies in hiring panel debriefs, where interviewers must align on candidate assessment standards.
These sessions typically run for 90 to 120 minutes and involve managers from the same department or business unit, facilitated by a senior leader or HR business partner. Each manager presents their proposed ratings, the group discusses borderline cases, and the facilitator guides the conversation to check for bias and ensure ratings reflect a consistent standard.
Calibration is not about forcing a distribution curve or taking away a manager's autonomy. It is about giving managers the benefit of peer perspective. A manager who has only seen their own team's performance has a narrow frame of reference. By comparing across teams, the group develops a richer, more accurate picture of what strong performance looks like at each level within the organisation. This protects employees from inconsistent treatment and gives the organisation confidence that its talent decisions are grounded in evidence rather than individual interpretation. Browse more frameworks in our people and talent templates collection.
Calibration sessions should be scheduled as a standard step in every performance review cycle. They are especially important when:
| Role | Responsibility |
|---|---|
| Senior Leader / VP | Chair the session, set expectations for the conversation, make final calls on disputed ratings, and ensure the overall distribution is credible and defensible. |
| HR Business Partner | Facilitate the discussion, flag potential biases, ensure the process follows organisational guidelines, and document agreed changes. |
| People Managers | Present their proposed ratings with supporting evidence, listen to peer feedback, and be open to adjusting ratings based on calibration discussion. |
| Skip-Level Manager (optional) | Provide additional context on individuals they have observed working across teams, particularly for cross-functional contributors. |
| Duration | Activity | Notes |
|---|---|---|
| 10 min | Ground rules and rating definitions | Facilitator reviews the rating scale definitions, reminds the group of common biases to watch for, and sets expectations for confidentiality and constructive challenge |
| 15 min | Overview of proposed ratings | Display the aggregated view of all proposed ratings across teams. Identify patterns: are ratings clustered at one end? Are some teams significantly higher or lower than others? |
| 40 min | Individual case discussions | Focus on edge cases and disputed ratings. Managers present evidence for their proposed rating; peers ask questions and offer perspective. Prioritise the highest and lowest ratings first |
| 15 min | Bias check and pattern review | Review the calibrated distribution for signs of bias: gender, tenure, team size, recency. Adjust if needed |
| 10 min | High-performer and development planning | Identify top talent for stretch assignments, promotion readiness, or retention risk. Flag individuals who need development support or a performance improvement plan |
| 10 min | Finalisation and next steps | Confirm final ratings, document any changes from proposed ratings, and agree on the timeline for communicating results to employees |
A product organisation of 60 people is conducting its mid-year performance review. The VP of Product convenes a calibration session with the four team leads, each managing 12 to 18 people, and the HR business partner. The session is scheduled for two hours on a Thursday afternoon, two weeks before review conversations are due to take place with employees.
Each manager has submitted their proposed ratings in advance. The HR partner compiles the data and immediately spots a discrepancy: Team A has 45% of its members rated "exceeds expectations," while Team C has only 10% at that level. When the session begins, the VP asks both managers to walk through their top-rated individuals with specific examples of impact. It becomes clear that Team A's manager has been applying a broader interpretation of "exceeds," using it to recognise effort and attitude rather than measurable outcomes above the role's expectations. Team C's manager, by contrast, has been unusually stringent, only awarding "exceeds" to individuals who delivered quantifiable results well beyond their scope.
Through discussion, the group aligns on a shared standard: "exceeds expectations" requires demonstrable impact above and beyond the core role requirements, supported by specific examples. Team A's manager agrees to adjust three ratings from "exceeds" to "meets expectations" after recognising that, while those individuals are valued contributors, their work is strong rather than exceptional. Team C's manager is encouraged to revisit one rating upward after peers highlight a cross-functional project contribution that the manager had underweighted. The session also identifies two individuals flagged for promotion readiness and one who needs a structured development plan. All changes are documented, and the VP confirms that review conversations should begin the following Monday.