Teaching evaluation surveys work better when students and staff help design them

By Student Voice AI

Updated Apr 23, 2026

When a teaching evaluation survey asks vague or biased questions, the problem starts before a single student responds. Janet L. Behling, Kirk D. Hagen, Taylor M. Eckerle and Patricia A. Connor-Greene's paper in Assessment & Evaluation in Higher Education, "Redesigning student evaluations of teaching: integrating faculty and student perspectives", shows why teaching evaluations become more credible and more useful when students and staff help design them together. For UK universities reviewing module evaluations, internal teaching surveys, and free-text feedback prompts, that is a practical lesson: student voice works better when the instrument is designed with students, not just for them.

Context and research question

Student evaluations of teaching still sit at the centre of many higher education quality processes, but institutions are asking more of them. Teams need feedback that is fair, actionable, and comparable over time. At the same time, evidence keeps building that legacy evaluation forms can embed bias through vague, subjective wording, ask vague questions, and generate scores that are hard to interpret with confidence.

Behling and colleagues address a foundational question: what should a teaching evaluation actually measure if it is meant to reflect effective teaching rather than a loose impression of satisfaction? Their answer is to start with a shared definition of effective teaching built from research and stakeholder input, then design the survey around it. For UK higher education teams, that is highly relevant. Many universities still rely on legacy module evaluation forms whose wording has changed little even as expectations around fairness, inclusion, and evidence use have shifted.

Key findings

The paper argues that redesign should start with definition, not administration. The authors note that research on student evaluations has repeatedly raised concerns about inequity and bias. In that context, it is not enough to tweak a few questions or move to a new platform. Institutions need to decide what effective teaching means in a way that is clear enough to support survey design and credible enough to support decisions.

The redesign process was deliberately participatory and staged. The paper describes a six-year, six-stage project that included surveying faculty about the existing instrument, reviewing the literature, gathering input from students, faculty, and administrators, selecting candidate items, pilot testing the revised survey, and drafting policy recommendations. That matters because it reframes evaluation design as a piece of institutional method, not routine administration. The benefit is a survey that people are more likely to trust, because they helped shape its purpose and wording, a principle that also underpins staff-student partnerships in assessment.

Students and faculty agreed on more core dimensions of effective teaching than universities often assume. The strongest shared items centred on communication, commitment, respect, course preparation and organisation, and passion for teaching. Those are not abstract labels. They are practical dimensions that students can recognise and academic teams can act on.

"faculty and students agreed on the most important items signalling effective teaching"

The findings also suggest that a robust core survey can coexist with local variation. If students and staff broadly converge on a set of central teaching qualities, universities can build a defensible institutional core for teaching evaluations while still leaving space for discipline-specific or module-specific questions. That is especially useful in UK settings where consistency matters for benchmarking, but local context still shapes how teaching is delivered.

The paper's most useful implication is that question design and policy design need to move together. Pilot testing and policy recommendations were part of the project because survey wording alone does not solve trust problems. Universities also need clarity about how results will be used, how often instruments are reviewed, and how qualitative comments complement scaled items. Without that wider governance, even a better questionnaire can still produce weak practice.

Practical implications

For UK universities, the first implication is to audit teaching evaluation forms against a simple standard: do the items reflect a shared, evidence-based view of effective teaching, or are they a legacy bundle of generic questions? A short institutional core built around communication, respect, organisation, preparation, and visible commitment will often be more useful than a longer form full of loosely defined traits.

Second, co-design improves both response quality and trust in the results. Student Experience teams, quality leads, students' unions, and academic staff should all have a role in reviewing evaluation items. This does not mean every question has to be negotiated from scratch. It means the instrument should be tested with the people who complete it and the people expected to act on the results, so the survey feels credible on both sides.

Third, a redesigned survey should make qualitative data easier to use, not easier to ignore. If a form asks students about communication, organisation, or respect, its open-text prompts should give them room to explain what is working or failing in those areas. This is where Student Voice Analytics connects naturally to the paper: when free-text comments are grouped consistently at scale, institutions can see whether the themes emerging in comments align with the dimensions their evaluation design is trying to measure, and whether changes to the survey are producing clearer signals over time.

FAQ

Q: How can a university redesign its module evaluation survey without creating a long, disruptive project?

A: Start with a focused core review rather than a complete rewrite. Identify the five to seven teaching dimensions your institution most needs to measure, test whether current items capture them clearly, and run short consultation sessions with students, staff, and quality teams. Then pilot a revised version in a limited set of modules before rolling it out more widely.

Q: What should we be cautious about when applying findings from a single redesign project?

A: The paper provides a strong process model, but not a universal final template. Institutions should treat the shared dimensions as a useful starting point, then validate question wording locally. Pilot testing matters because students may interpret apparently simple items differently across subjects, levels of study, or delivery modes, and because student evaluation scores are not automatically comparable over time.

Q: What does this mean for student voice beyond scaled evaluation questions?

A: It reinforces that scaled questions and open-text comments should work together. A good survey core tells you what to ask consistently; qualitative feedback tells you why students answered that way and what should change. Universities get the strongest student voice evidence when both parts are designed intentionally and analysed together.

References

[Paper Source]: Janet L. Behling, Kirk D. Hagen, Taylor M. Eckerle and Patricia A. Connor-Greene "Redesigning student evaluations of teaching: integrating faculty and student perspectives" DOI: 10.1080/02602938.2025.2479117

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

All-comment coverage with HE-tuned taxonomy and sentiment.
Versioned outputs with TEF-ready reporting.
Benchmarks and BI-ready exports for boards and Senate.

Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround