Performance Review Calibration Template

Overview

A performance review calibration meeting brings managers together to compare and align their performance ratings before those ratings are finalised and communicated to employees. The purpose is to ensure consistency, fairness, and a shared understanding of what each rating level means in practice. Without calibration, one manager's "exceeds expectations" might be another manager's "meets expectations," creating inequity across teams. The same principle of structured evaluation applies in hiring panel debriefs, where interviewers must align on candidate assessment standards.

These sessions typically run for 90 to 120 minutes and involve managers from the same department or business unit, facilitated by a senior leader or HR business partner. Each manager presents their proposed ratings, the group discusses borderline cases, and the facilitator guides the conversation to check for bias and ensure ratings reflect a consistent standard.

Calibration is not about forcing a distribution curve or taking away a manager's autonomy. It is about giving managers the benefit of peer perspective. A manager who has only seen their own team's performance has a narrow frame of reference. By comparing across teams, the group develops a richer, more accurate picture of what strong performance looks like at each level within the organisation. This protects employees from inconsistent treatment and gives the organisation confidence that its talent decisions are grounded in evidence rather than individual interpretation. Browse more frameworks in our people and talent templates collection.

When to Use This Framework

Calibration sessions should be scheduled as a standard step in every performance review cycle. They are especially important when:

Your organisation uses a rating scale and ratings influence compensation, promotion, or development decisions
Multiple managers assess employees at the same level or in similar roles, and consistency across those assessments matters
New managers are participating in the review cycle for the first time and may lack calibration with their peers. Insights from regular one-on-one meetings should inform their proposed ratings
The organisation has grown significantly, and what constitutes strong performance at each level may have shifted
Previous review cycles surfaced complaints about unfair or inconsistent ratings, and leadership wants to address the issue structurally
There is a suspicion of rating inflation, where every team appears to have only high performers, or rating deflation in certain teams

Who Should Attend

Role	Responsibility
Senior Leader / VP	Chair the session, set expectations for the conversation, make final calls on disputed ratings, and ensure the overall distribution is credible and defensible.
HR Business Partner	Facilitate the discussion, flag potential biases, ensure the process follows organisational guidelines, and document agreed changes.
People Managers	Present their proposed ratings with supporting evidence, listen to peer feedback, and be open to adjusting ratings based on calibration discussion.
Skip-Level Manager (optional)	Provide additional context on individuals they have observed working across teams, particularly for cross-functional contributors.

Sample Agenda

Duration	Activity	Notes
10 min	Ground rules and rating definitions	Facilitator reviews the rating scale definitions, reminds the group of common biases to watch for, and sets expectations for confidentiality and constructive challenge
15 min	Overview of proposed ratings	Display the aggregated view of all proposed ratings across teams. Identify patterns: are ratings clustered at one end? Are some teams significantly higher or lower than others?
40 min	Individual case discussions	Focus on edge cases and disputed ratings. Managers present evidence for their proposed rating; peers ask questions and offer perspective. Prioritise the highest and lowest ratings first
15 min	Bias check and pattern review	Review the calibrated distribution for signs of bias: gender, tenure, team size, recency. Adjust if needed
10 min	High-performer and development planning	Identify top talent for stretch assignments, promotion readiness, or retention risk. Flag individuals who need development support or a performance improvement plan
10 min	Finalisation and next steps	Confirm final ratings, document any changes from proposed ratings, and agree on the timeline for communicating results to employees

Example Use Case

A product organisation of 60 people is conducting its mid-year performance review. The VP of Product convenes a calibration session with the four team leads, each managing 12 to 18 people, and the HR business partner. The session is scheduled for two hours on a Thursday afternoon, two weeks before review conversations are due to take place with employees.

Each manager has submitted their proposed ratings in advance. The HR partner compiles the data and immediately spots a discrepancy: Team A has 45% of its members rated "exceeds expectations," while Team C has only 10% at that level. When the session begins, the VP asks both managers to walk through their top-rated individuals with specific examples of impact. It becomes clear that Team A's manager has been applying a broader interpretation of "exceeds," using it to recognise effort and attitude rather than measurable outcomes above the role's expectations. Team C's manager, by contrast, has been unusually stringent, only awarding "exceeds" to individuals who delivered quantifiable results well beyond their scope.

Through discussion, the group aligns on a shared standard: "exceeds expectations" requires demonstrable impact above and beyond the core role requirements, supported by specific examples. Team A's manager agrees to adjust three ratings from "exceeds" to "meets expectations" after recognising that, while those individuals are valued contributors, their work is strong rather than exceptional. Team C's manager is encouraged to revisit one rating upward after peers highlight a cross-functional project contribution that the manager had underweighted. The session also identifies two individuals flagged for promotion readiness and one who needs a structured development plan. All changes are documented, and the VP confirms that review conversations should begin the following Monday.

Best Practices

Require evidence, not opinions. Every rating should be supported by specific examples of behaviour and impact. "She's great" is not calibration; "She led the migration project on time and under budget, reducing infrastructure costs by 18%" is.
Review rating definitions at the start of every session. Even experienced managers benefit from a refresher. Definitions drift over time if they are not regularly reinforced.
Discuss the extremes first. Start with the highest and lowest proposed ratings, as these are where inconsistency is most consequential and most visible to employees.
Normalise rating changes as part of the process. Managers should not feel defensive when their ratings are adjusted. Calibration is collaborative refinement, not criticism of their judgement.
Check for recency bias explicitly. Ask managers whether their rating reflects the full review period or is heavily influenced by the most recent weeks. Strong performance early in the cycle deserves recognition even if the latest sprint was average.
Protect confidentiality rigorously. What is discussed in calibration stays in calibration. If employees learn that their ratings were debated or changed, trust in the process collapses.
Use calibration to inform development, not just ratings. The discussion often surfaces insights about high-potential individuals, flight risks, and capability gaps that are more valuable than the ratings themselves. Feed these findings back into one-on-one meetings and OKR check-ins to drive meaningful development.
Keep the group size manageable. Calibrating more than 60 to 80 people in a single session is impractical. Split into sub-sessions by team or level if the organisation is larger.

Common Mistakes

Forcing a bell curve. Mandating a fixed distribution of ratings demoralises high-performing teams and inflates ratings in weaker ones. Use calibration to achieve fairness, not mathematical symmetry.
Allowing the loudest manager to dominate. A confident, articulate manager can sway the room regardless of evidence quality. The facilitator must ensure every manager's cases receive equal scrutiny.
Skipping calibration for senior or tenured employees. Bias affects ratings at every level. Senior employees are not immune from inconsistent assessment, and their ratings often carry the highest stakes in terms of compensation and promotion.
Treating calibration as a box-ticking exercise. If the session is rushed or conducted without genuine challenge, it adds overhead without improving fairness. Allocate sufficient time and hold the group to a standard of rigour.
Not training new managers before their first calibration. First-time managers often do not know what to prepare or how the process works. A 30-minute briefing beforehand prevents confusion and poor-quality submissions.
Ignoring systemic patterns. If women, part-time employees, or members of a particular team consistently receive lower ratings cycle after cycle, that is a systemic issue that calibration should surface and escalate, not normalise.
Communicating ratings without explaining the rationale. Employees deserve to understand why they received their rating. If a manager cannot articulate the reasoning clearly, the calibration process has not done its job. Practise delivering this rationale in advance during your weekly leadership sync.

Overview

When to Use This Framework

Who Should Attend

Sample Agenda

Example Use Case

Best Practices

Common Mistakes

Related Templates

1:1 Manager-Employee Meeting

Hiring Panel Debrief

OKR Check-in Meeting

Quarterly Business Review

Related Tools

Meeting Duration Calculator

Attendee Limit Calculator

Meeting Readiness Checklist