Who Actually Fills In Student Evaluations? New Evidence on Non-Response Bias

By Student Voice AI

Updated Apr 23, 2026

A healthy response rate can hide a serious blind spot: the students you are not hearing from. If lower-income students, later-year students or some ethnic groups respond at lower rates, institutions can end up making decisions about teaching quality on a distorted version of the student voice. Non-response bias then becomes a quality and governance issue, not just a survey administration problem.

A new paper in Studies in Higher Education by Erica De Bruin, Ann L. Owen and Stephen Wu tests that risk through a randomised experiment. It shows that small changes to prompt wording and timing can shift not only how many students respond, but which students are represented in the data. For institutions that rely on student evaluations, the practical message is immediate: audit who is missing, test prompt design and treat headline response rates as only half the story.

The problem: who is not responding?

Student evaluations of teaching (SETs) are a near-universal feature of higher education. They feed into promotion cases, module reviews and quality assurance processes. Since many institutions moved them online, response rates have often fallen, typically to between 30 and 75 per cent. That mirrors wider challenges of engaging students in student voice.

But the key question is not just how many students respond. It is which students do. If certain demographic groups are systematically less likely to complete evaluations, the resulting data can misrepresent the student experience even when overall participation looks acceptable. A respectable response rate is not the same as a representative sample, and that distinction matters whenever institutions use evaluations to judge teaching quality, inform promotion or support quality assurance.

De Bruin, Owen and Wu's study is one of the first to use a randomised controlled experiment to test whether practical changes to the evaluation process can reduce non-response bias. Conducted at a selective liberal arts college in the United States, the experiment assigned students to one of three conditions: a traditional end-of-semester evaluation, an alternative prompt asking students to articulate their own criteria for effective teaching, or a delayed solicitation sent at the start of the following semester. That design matters because it focuses on two levers most universities can actually change: wording and timing. The paper therefore does more than diagnose the problem; it tests realistic ways to improve representativeness before the data is used in institutional decisions.

Key findings

Certain student groups are consistently under-represented. Across all study conditions, Pell Grant recipients, which the paper uses as a proxy for lower-income students, alongside students with lower GPAs and those later in their college careers, were significantly less likely to complete teaching evaluations. They were also less likely to write more than three words in response to qualitative questions. In practice, that means the students facing the greatest barriers may be the easiest to miss, unless teams check participation gaps alongside headline response rates.

Racial disparities are most pronounced in traditional evaluations. Black, Hispanic and multi-racial students were less likely than white students to complete evaluations solicited at the standard end-of-semester point. For institutions using SET data to assess teaching quality, that creates a clear risk: teams may draw conclusions from a partial sample if they treat the dataset as representative by default. Spotting that skew early helps institutions qualify the evidence, target follow-up and avoid overclaiming from incomplete data.

"Black, Hispanic, and multi-racial students are less likely than white students to complete traditional evaluations solicited at the end-of-semester."

An alternative prompt increases qualitative engagement. When students were asked to define their own criteria for effective teaching, rather than respond to the standard institutional questionnaire, they were more likely to write more than three words and produced longer responses overall. That points to a practical opportunity: question design matters. Giving students more agency over the evaluation criteria can encourage richer, more substantive feedback, reinforcing the case for free-text comments in module evaluation. For teams redesigning evaluation forms, prompt wording is a low-cost lever for fuller feedback and more decision-grade evidence.

Delayed solicitation creates a more racially representative sample, but at a cost. Sending evaluations at the start of the following semester reduced overall response rates, particularly among graduating students. However, the decrease was smaller for Black, Hispanic and multi-racial students than for white students, resulting in a more racially balanced sample. The authors describe this as a potential trade-off between overall response rates and the representation of minority student voices. The practical lesson is clear: maximising volume and maximising representativeness are not always the same goal. Institutions need to decide which risk matters most in a given context, then design the process accordingly.

Practical implications for UK higher education

While this study was conducted at a US institution, the underlying dynamics are highly relevant to the UK context. The NSS, PTES, PRES, UKES and institutional module evaluations all face the same challenge: headline response rates can hide which students are missing from the sample. For UK teams, the takeaway is practical: treat representativeness as a design question, not just a reporting problem. That fits wider evidence on student motivations and perceptions in teaching evaluations, especially where students may need clearer reasons to participate and clearer routes to be heard. In practice, that means reviewing prompts, monitoring participation gaps and being explicit about the limits of the evidence before leaders act on the findings.

Rethink evaluation prompts. The finding that an alternative, student-centred prompt increases qualitative engagement is directly actionable. Institutions designing module evaluation forms or free-text questions for the NSS could test prompts that invite students to define what matters to them, rather than only respond to pre-set criteria. Even a relatively small wording change could produce richer commentary, especially from groups that traditionally under-engage. The payoff is stronger qualitative evidence for action planning, without redesigning the whole survey or waiting for a major system change.

Consider who is missing. Student Experience teams and Pro Vice-Chancellors for Education should routinely audit evaluation response data by demographic group, not just by module or department. If certain cohorts are under-represented, teams may be steering decisions with evidence that misses part of the student experience. Regular audits give teams a firmer basis for interpreting scores and comments before they feed into staff review, quality enhancement or targeted follow-up.

Weigh the response-rate versus representativeness trade-off. The finding that delayed solicitation reduces response rates but improves demographic balance presents a genuine policy tension. Institutions may need to decide whether a slightly lower overall response rate is an acceptable price for a more inclusive dataset, or explore hybrid approaches that combine end-of-semester collection with targeted follow-up for under-represented groups. Making that choice explicitly is better than assuming the standard timetable is neutral. In many cases, the default process already shapes whose views get counted and whose do not.

The broader lesson is straightforward: evaluation design shapes whose voices count, and that affects how confidently institutions can act on the findings. Universities that only monitor overall response rates may overlook systematic bias in the evidence they use for teaching review, quality enhancement and EDI work. Teams that monitor representativeness as closely as participation are better placed to act with confidence and less likely to mistake response volume for evidence quality. Before the next evaluation cycle, review response patterns by cohort, decide whether your current process is optimised for volume, representativeness or both, and adjust the prompt or timetable accordingly.

FAQ

Q: How can universities apply these findings when they cannot easily replicate a randomised experiment?

A: Institutions do not need to run a formal experiment to act on these findings. A practical first step is to analyse existing evaluation data by student demographics, such as ethnicity, socioeconomic background and year of study, to identify which groups are under-represented. Many student records systems already hold this information. Once gaps are identified, universities can pilot alternative prompts, different reminder timings or targeted follow-up for low-responding groups and compare the results across cycles. Even small faculty-level pilots can generate useful local evidence before wider changes are made.

Q: Does this research suggest that student evaluations are fundamentally unreliable?

A: Not unreliable, but incomplete unless institutions understand who is missing. The paper does not argue against using student evaluations; it argues for using them with more methodological rigour. The core message is that institutions should check for non-response bias, consider whose voices are absent, and triangulate with other data sources such as free-text comment analysis. When that qualitative data is analysed at scale, it can reveal themes and concerns that Likert-scale averages alone may miss, which makes sample quality even more important for credible decision-making.

Q: How does this connect to broader efforts around equality, diversity and inclusion in UK higher education?

A: If evaluation systems systematically under-represent the views of students from minority ethnic backgrounds or lower socioeconomic groups, decisions made on the basis of that data risk reinforcing inequities. The OfS and institutions themselves have placed increasing emphasis on reducing awarding gaps and improving outcomes for under-represented students. Ensuring that student voice mechanisms genuinely capture diverse perspectives is a necessary part of that effort. It is not just an administrative detail, it is a matter of institutional equity.

References

[Paper Source]: Erica De Bruin, Ann L. Owen and Stephen Wu "Can student evaluations be made more representative? Testing alternative strategies" DOI: 10.1080/03075079.2025.2467424

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

All-comment coverage with HE-tuned taxonomy and sentiment.
Versioned outputs with TEF-ready reporting.
Benchmarks and BI-ready exports for boards and Senate.

Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround