Could ranking teachers work better than rating them in student evaluations?

By Student Voice AI

Updated Apr 09, 2026

When two lecturers are separated by 0.2 points on a module survey, universities still have to decide whether that gap means anything. Yu Xiao, Bohua Wang, Ye Deng and Jun Wu's Studies in Educational Evaluation paper, "From ratings to rankings: A complementary approach for student evaluations of teaching in higher education", asks whether students should rank teachers directly instead of assigning near-identical scores on long evaluation forms.

For UK universities using student evaluations of teaching, module surveys and free-text comments to review teaching quality, that makes the paper immediately useful. It shows how a small change in evaluation design might produce a clearer signal where standard scores bunch together.

Context and research question

Traditional student evaluations of teaching, often shortened to SETs, usually ask students to rate teaching against a list of indicators and then average those scores. The authors argue that this familiar model has three structural weaknesses: lengthy forms create fatigue, students use scales inconsistently, and mean differences between teachers are often so small that they are hard to use credibly in decision-making. That concern overlaps with our recent summary of what gets students to fill in teaching evaluations, because burden and survey design shape whether institutions get usable evidence at all.

That problem matters in UK higher education because internal module evaluations often face the same tension. Student Experience teams want survey data they can use, academic staff want something fair, and senior leaders want evidence that can support improvement without pretending tiny score gaps are more precise than they really are.

The paper therefore asks a practical question: could a ranking-based system work better as a complementary form of teaching evaluation? Instead of asking students to score multiple attributes, the proposed model asks them to rank teachers by overall performance. Those partial rankings are then aggregated through a network science-based framework to produce a wider institutional ranking. If that approach holds up, institutions may get a stronger comparative signal without asking students to complete another dense grid.

Key findings

The paper's starting point is that score-based SETs often create weak signals for high-stakes interpretation. The authors point to rating-scale heterogeneity, student fatigue and minimal mean differences as reasons why conventional forms can struggle to distinguish performance clearly. For UK institutions, that echoes a familiar problem: one lecturer's 4.3 and another's 4.1 can look meaningful in a dashboard while still being hard to interpret in practice.

The proposed intervention is comparative judgement rather than absolute scoring. Students are asked to rank teachers they know on overall performance, rather than complete a complex grid of items. The rationale is simple: comparison may be cognitively easier for students and less vulnerable to the problem of one student being a harsh scorer while another is lenient. If true, that gives institutions a cleaner signal to investigate.

"By aggregating these rankings, a comprehensive, university-wide ranking of teacher performance can be obtained."

The authors argue that rankings can reduce reporting-style inconsistency and lighten the evaluation burden. Because the approach focuses on relative judgement, it is designed to side-step at least some of the variation caused by students using numeric scales differently. The article also frames this as a way to reduce the cognitive burden created by multi-indicator forms, which may improve the quality of the evidence students provide.

The empirical case is presented as promising rather than universal. The ranking-based approach was implemented in undergraduate teaching evaluations at a Chinese university over several years, using routine institutional data that was dual-anonymised for analysis. The authors report that the method produced reliable rankings and supported administrative evaluation. That makes the study practical, but it also means UK readers should treat it as an applied institutional test rather than a settled answer for every context.

The most important takeaway is that ranking may sharpen comparison, but it does not explain the reasons behind the comparison. A rank can tell you that students place one teacher above another. It cannot tell you whether the difference is about organisation, clarity, assessment guidance, pace, approachability or the wider course structure. In that sense, the paper strengthens the case for combining leaner quantitative instruments with stronger qualitative evidence, not for replacing student voice with a league table, echoing the case for free-text comments in module evaluation.

Practical implications

For UK universities, the value of this paper is not that it settles the debate. It shows where a modest change in evaluation design might make teaching data easier to interpret, provided institutions keep qualitative context and governance in view.

Pilot ranking where scores are tightly clustered. A ranking question could sit alongside current module evaluations in a trial period, allowing teams to compare whether relative judgements surface clearer patterns than standard rating items alone. That gives institutions a low-risk way to test whether ranking adds signal before redesigning teaching evaluation surveys with staff and student input at scale.

Use comments to explain the signal. A ranking-based SET may help identify where students perceive meaningful differences in teaching quality, but it still cannot show why those differences exist. That is where open-text comments remain essential. If a teacher or module drops in comparative standing, universities need the student comments that explain whether the issue is feedback clarity, poor signposting, weak organisation or something outside the classroom altogether.

Set rules for fair use before the data becomes consequential. This paper is explicitly interested in administrative utility, but the move from ratings to rankings could make competitive comparison feel even more consequential for staff. UK institutions would therefore need clear rules about how ranking data is used, which decisions it can inform, and where it must be supplemented by peer review, local context and qualitative evidence. Student Voice Analytics fits naturally here: it can categorise and benchmark the free-text comments behind those rankings, helping teams move from comparative signals to specific actions.

FAQ

Q: How could a UK university test a ranking-based approach without replacing its current module evaluation form?

A: The most sensible route would be a pilot. Add one comparative ranking question to a small number of modules or programmes alongside the existing evaluation instrument, then compare how much useful variation each method produces. If ranking surfaces clearer differences, institutions can test whether those differences line up with open-text themes, peer review or other teaching evidence before making wider changes, while checking for non-response bias in student evaluations if pilot modules produce thin samples.

Q: What are the main methodological limits of this study?

A: The paper reports implementation at one Chinese university, so transferability should not be assumed automatically. Local evaluation culture, course structures, and institutional incentives all matter. The study is best read as a practical demonstration that ranking-based SETs are feasible and potentially useful, not as proof that every university should replace rating scales with rankings tomorrow.

Q: What does this mean for student voice beyond SET scores?

A: It reinforces a broader point: universities need feedback systems that separate signal from noise without stripping away meaning. Ranking may help where score averages bunch together, but student voice becomes actionable only when institutions can connect those comparative results to the themes students raise in comments. The most useful system is therefore likely to combine concise survey design with rigorous analysis of free-text responses.

References

[Paper Source]: Yu Xiao, Bohua Wang, Ye Deng and Jun Wu "From ratings to rankings: A complementary approach for student evaluations of teaching in higher education" DOI: 10.1016/j.stueduc.2025.101518

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

All-comment coverage with HE-tuned taxonomy and sentiment.
Versioned outputs with TEF-ready reporting.
Benchmarks and BI-ready exports for boards and Senate.

Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround