Who Actually Fills In Student Evaluations? New Evidence on Non-Response Bias

Updated Apr 12, 2026

Who fills in student evaluations matters as much as how many do. A healthy response rate can still hide a serious blind spot if the students institutions most need to hear from are missing from the sample.

When lower-income students, later-year students or some ethnic groups are less likely to respond, decisions about teaching quality can rest on a distorted version of the student voice. That turns non-response bias into a quality and governance issue, not just a survey admin problem.

A new paper in Studies in Higher Education by Erica De Bruin, Ann L. Owen and Stephen Wu examines that risk through a randomised experiment. The study shows that small changes to evaluation prompts and timing can change not just how many students respond, but who gets represented in the data. For institutions that rely on student evaluations, the practical message is immediate: audit who is missing, test prompt design and treat headline response rates as only half the story.

The problem: who is not responding?

Student evaluations of teaching (SETs) are a near-universal feature of higher education. They feed into promotion cases, module reviews and quality assurance processes. Since many institutions moved them online, response rates have often fallen, typically to between 30 and 75 per cent. That echoes wider challenges of engaging students in student voice.

But the key question is not just how many students respond. It is which students do. If certain demographic groups are systematically less likely to complete evaluations, the resulting data can misrepresent the student experience even when overall participation looks acceptable. A respectable response rate is not the same as a representative sample. That distinction matters whenever institutions use evaluations to judge teaching quality, promotion or quality assurance.

De Bruin, Owen and Wu's study is one of the first to use a randomised controlled experiment to test whether practical changes to the evaluation process can reduce non-response bias. Conducted at a selective liberal arts college in the United States, the experiment assigned students to one of three conditions: a traditional end-of-semester evaluation, an alternative prompt asking students to articulate their own criteria for effective teaching, or a delayed solicitation sent at the start of the following semester. That design matters because it focuses on two levers most universities can actually change: wording and timing. The paper does not just diagnose the problem. It tests realistic ways to improve representativeness before the data feeds into institutional decisions.

Key findings

Certain student groups are consistently under-represented. Across all study conditions, Pell Grant recipients, used in the paper as a proxy for lower-income students, students with low GPAs and those later in their college careers were significantly less likely to complete teaching evaluations. They were also less likely to write more than three words in response to qualitative questions. This matters because students facing the greatest barriers may also be the easiest to miss in evaluation data. Teams that read response rates alongside participation gaps are in a stronger position to judge how representative the evidence really is.

Racial disparities are most pronounced in traditional evaluations. Black, Hispanic and multi-racial students were less likely than white students to complete evaluations solicited at the standard end-of-semester point. For institutions using SET data to assess teaching quality, that creates a clear risk: teams may draw conclusions from a partial and skewed sample, especially if they treat the dataset as representative by default. Spotting that skew early gives institutions a chance to qualify the evidence, adjust follow-up and avoid overclaiming from an incomplete sample.

"Black, Hispanic, and multi-racial students are less likely than white students to complete traditional evaluations solicited at the end-of-semester."

An alternative prompt increases qualitative engagement. When students were asked to define their own criteria for effective teaching, rather than respond to the standard institutional questionnaire, they were more likely to write more than three words and produced longer responses overall. This suggests that question design matters. Giving students more agency over the evaluation criteria can encourage richer, more substantive feedback, reinforcing the case for free-text comments in module evaluation. For teams redesigning evaluation forms, prompt wording offers a low-cost way to surface fuller feedback and more decision-grade evidence without rebuilding the whole process.

Delayed solicitation creates a more racially representative sample, but at a cost. Sending evaluations at the start of the following semester reduced overall response rates, particularly among graduating students. However, the decrease was smaller for Black, Hispanic and multi-racial students than for white students, resulting in a more racially balanced sample. The authors describe this as a potential tradeoff between overall response rates and the representation of minority student voices. The practical lesson is clear: maximising volume and maximising representativeness are not always the same goal. Institutions need to decide which risk matters most in a given context and design their process accordingly.

Practical implications for UK higher education

While this study was conducted at a US institution, the underlying dynamics are highly relevant to the UK context. The NSS, PTES, PRES, UKES and institutional module evaluations all face the same challenge: headline response rates can mask which students are missing from the sample. The immediate takeaway for UK teams is operational: treat representativeness as a design question, not just a reporting problem. That fits wider evidence on student motivations and perceptions in teaching evaluations, especially where students may need clearer reasons to participate and clearer routes to be heard. In practice, that means reviewing prompts, monitoring participation gaps and being explicit about the limits of the evidence before leaders act on it.

Rethink evaluation prompts. The finding that an alternative, student-centred prompt increases qualitative engagement is directly actionable. Institutions designing module evaluation forms or free-text questions for the NSS could test prompts that invite students to define what matters to them, rather than only respond to pre-set criteria. Even a relatively small wording change could produce richer commentary, especially from groups that traditionally under-engage. The payoff is stronger qualitative evidence for action planning, without redesigning the whole survey.

Consider who is missing. Student Experience teams and Pro-Vice-Chancellors for Education should routinely audit evaluation response data by demographic group, not just by module or department. If certain cohorts are under-represented, teams may be steering decisions with evidence that misses part of the student experience. Regular audits give teams a firmer basis for interpreting scores and comments before they feed into staff review, quality enhancement or targeted follow-up.

Weigh the response-rate versus representativeness tradeoff. The finding that delayed solicitation reduces response rates but improves demographic balance presents a genuine policy tension. Institutions may need to decide whether a slightly lower overall response rate is an acceptable price for a more inclusive dataset, or explore hybrid approaches that combine end-of-semester collection with targeted follow-up for under-represented groups. Making that choice explicitly is better than assuming the standard timetable is neutral. In many cases, the default process already shapes whose views get counted and whose do not.

The broader lesson is straightforward: evaluation design shapes whose voices count, and that affects how confidently institutions can act on the findings. Universities that only monitor overall response rates may overlook systematic bias in the evidence they use for teaching review, quality enhancement and EDI work. Teams that monitor representativeness as closely as participation are better placed to act with confidence, and less likely to mistake response volume for evidence quality.

FAQ

Q: How can universities apply these findings when they cannot easily replicate a randomised experiment?

A: Institutions do not need to run a formal experiment to act on these findings. A practical first step is to analyse existing evaluation data by student demographics, such as ethnicity, socioeconomic background and year of study, to identify which groups are under-represented. Many student records systems already hold this information. Once gaps are identified, universities can pilot alternative prompts, changes to reminder timing or targeted follow-up for low-responding groups and compare the results across cycles. Even small faculty-level pilots can generate useful local evidence before wider changes are made.

Q: Does this research suggest that student evaluations are fundamentally unreliable?

A: Not unreliable, but incomplete unless institutions understand who is missing. The paper does not argue against using student evaluations; it argues for using them with more methodological rigour. The core message is that institutions should check for non-response bias, consider whose voices are absent, and triangulate with other data sources such as free-text comment analysis. When qualitative data is analysed at scale, it can reveal themes and concerns that Likert-scale averages alone may miss, which makes sample quality even more important for credible decision-making.

Q: How does this connect to broader efforts around equality, diversity and inclusion in UK higher education?

A: If evaluation systems systematically under-represent the views of students from minority ethnic backgrounds or lower socioeconomic groups, then decisions made on the basis of that data risk perpetuating inequities. The OfS and institutions themselves have placed increasing emphasis on reducing awarding gaps and improving outcomes for under-represented students. Ensuring that student voice mechanisms genuinely capture diverse perspectives is a necessary part of that effort. It is not just an administrative detail. It is a matter of institutional equity.

References

[Paper Source]: Erica De Bruin, Ann L. Owen and Stephen Wu "Can student evaluations be made more representative? Testing alternative strategies" DOI: 10.1080/03075079.2025.2467424

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

  • All-comment coverage with HE-tuned taxonomy and sentiment.
  • Versioned outputs with TEF-ready reporting.
  • Benchmarks and BI-ready exports for boards and Senate.
Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround

Related Entries

The Student Voice Weekly

Research, regulation, and insight on student voice. Every Friday.

© Student Voice Systems Limited, All rights reserved.