Teaching evaluation surveys work better when students and staff help design them

Updated Mar 15, 2026

At Student Voice AI, we spend a great deal of time thinking about how survey design shapes the quality of the feedback universities receive. That is why Janet L. Behling, Kirk D. Hagen, Taylor M. Eckerle and Patricia A. Connor-Greene's paper in Assessment & Evaluation in Higher Education, "Redesigning student evaluations of teaching: integrating faculty and student perspectives", matters well beyond one institution. For UK universities reviewing module evaluations, internal teaching surveys, and free-text feedback prompts, it offers a practical reminder that student voice works better when the instrument itself is designed with students, not just for them.

Context and research question

Student evaluations of teaching remain a common part of higher education quality processes, but they are under pressure from several directions. Institutions want feedback that is fair, actionable, and comparable over time. At the same time, there is mounting evidence that inherited evaluation forms can embed bias, ask vague questions, and produce scores that are hard to interpret confidently.

Behling and colleagues address a foundational question: what should a teaching evaluation actually measure if it is meant to reflect effective teaching rather than a loose impression of satisfaction? Their answer is not to start with a new list of items, but to begin with a shared definition of effective teaching built from research and stakeholder input. For UK higher education teams, that is highly relevant. Many universities still rely on legacy module evaluation forms whose wording has changed little even as expectations around fairness, inclusion, and evidence use have shifted.

Key findings

The paper argues that redesign should start with definition, not administration. The authors note that research on student evaluations has repeatedly raised concerns about inequity and bias. In that context, it is not enough to tweak a few questions or change the platform. Institutions need to decide what effective teaching means in a way that is clear enough to support survey design and credible enough to support decisions.

The redesign process was deliberately participatory and staged. The paper describes a six-year, six-stage project that included surveying faculty about the existing instrument, reviewing the literature, gathering input from students, faculty, and administrators, selecting candidate items, pilot testing the revised survey, and drafting policy recommendations. That is important because it reframes evaluation design as a piece of institutional method, not routine administration.

Students and faculty agreed on more core dimensions of effective teaching than universities often assume. The strongest shared items centred on communication, commitment, respect, course preparation and organisation, and passion for teaching. Those are not abstract branding terms. They are practical dimensions that can be recognised by students and acted on by academic teams.

"faculty and students agreed on the most important items signalling effective teaching"

The findings also suggest that a robust core survey can coexist with local variation. If students and staff broadly converge on a set of central teaching qualities, universities can build a defensible institutional spine for teaching evaluations while still leaving space for discipline-specific or module-specific questions. That is especially useful in UK settings where consistency matters for benchmarking, but local context still shapes how teaching is delivered.

The paper's most useful implication is that question design and policy design need to move together. Pilot testing and policy recommendations were part of the project because survey wording alone does not solve trust problems. Universities also need clarity about how results will be used, how often instruments are reviewed, and how qualitative comments complement scaled items. Without that wider governance, even a better questionnaire can still produce weak practice.

Practical implications

For UK universities, the first implication is to audit teaching evaluation forms against a simple standard: do the items reflect a shared, evidence-based view of effective teaching, or are they a legacy bundle of generic questions? A short institutional core built around communication, respect, organisation, preparation, and visible commitment would often be more useful than a longer form full of loosely defined traits.

Second, co-design matters. Student Experience teams, quality leads, students' unions, and academic staff should all have a role in reviewing evaluation items. This does not mean every question has to be negotiated from scratch. It means the instrument should be tested with the people who complete it and the people expected to act on the results. That is how universities improve both response quality and trust in the data.

Third, a redesigned survey should make the qualitative data more useful, not less. If a form asks students about communication, organisation, or respect, its open-text prompts should give them room to explain what is working or failing in those areas. This is where Student Voice Analytics connects naturally to the paper: when free-text comments are grouped consistently at scale, institutions can see whether the themes emerging in comments align with the dimensions their evaluation design is trying to measure, and whether changes to the survey are producing clearer signals over time.

FAQ

Q: How can a university redesign its module evaluation survey without creating a long, disruptive project?

A: Start with a focused core review rather than a complete rewrite. Identify the five to seven teaching dimensions your institution most needs to measure, test whether current items capture them clearly, and run short consultation sessions with students, staff, and quality teams. Then pilot a revised version in a limited set of modules before rolling it out more widely.

Q: What should we be cautious about when applying findings from a single redesign project?

A: The paper provides a strong process model, but not a universal final template. Institutions should treat the shared dimensions as a useful starting point, then validate question wording locally. Pilot testing matters because students may interpret apparently simple items differently across subjects, levels of study, or delivery modes.

Q: What does this mean for student voice beyond scaled evaluation questions?

A: It reinforces that scaled questions and open-text comments should work together. A good survey core tells you what to ask consistently; qualitative feedback tells you why students answered that way and what should change. Universities get the strongest student voice evidence when both parts are designed intentionally and analysed together.

References

[Paper Source]: Janet L. Behling, Kirk D. Hagen, Taylor M. Eckerle and Patricia A. Connor-Greene "Redesigning student evaluations of teaching: integrating faculty and student perspectives" DOI: 10.1080/02602938.2025.2479117

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

  • All-comment coverage with HE-tuned taxonomy and sentiment.
  • Versioned outputs with TEF-ready reporting.
  • Benchmarks and BI-ready exports for boards and Senate.
Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround

Related Entries

The Student Voice Weekly

Research, regulation, and insight on student voice. Every Friday.

© Student Voice Systems Limited, All rights reserved.