Updated Apr 02, 2026
When universities add Generative AI to feedback workflows, the harder question is not whether students will use it. It is whether they will trust it when grades, standards, and confidence are on the line. That is why this recent Assessment & Evaluation in Higher Education paper by Michael Henderson, Margaret Bearman, Jennifer Chung, Tim Fawns, Simon Buckingham Shum, Kelly E. Matthews and Jimena de Mello Heredia matters. For universities collecting module evaluation comments, AI pilot feedback, and broader student experience data, it offers a practical takeaway: students may welcome both sources, but they do not trust them equally.
Many universities are experimenting with AI-supported feedback because it is fast, always available, and relatively cheap to scale. But faster feedback only helps if students believe it is accurate, fair, and worth acting on. If students only trust AI for low-stakes tasks, institutions need to know that before they redesign assessment support around it.
Henderson and colleagues tackle that question through a large mixed-methods study across 18 higher education institutions. The project combined survey responses from 1,001 students with 8,642 open-text comments about experiences of GenAI and teacher feedback. For UK higher education teams, the value is practical: this is not just a popularity test. It shows where students trust AI, where they hesitate, and how that should shape feedback design.
Students found both sources useful, but they trusted teacher feedback far more. The paper reports that 49.7% of respondents had used Generative AI for feedback. Most students rated both sources as at least somewhat helpful, yet trust split much more clearly: 90.5% said teacher feedback was trustworthy, compared with 60.1% for GenAI.
That gap matters because students did not treat feedback as a single category. They matched the source to the task. GenAI was valued for speed, accessibility, and low-friction support, especially when students wanted to test an idea, check structure, or get an immediate response outside staff hours. Teacher feedback was valued for credibility, contextual judgement, and disciplinary understanding, especially when the work was complex or the consequences felt high.
"GenAI and teacher feedback appear to serve different needs, and therefore are complementary but not interchangeable."
The open comments explain why trust diverged. Students often described teacher feedback as more specific, better aligned to module expectations, and more reliable when they needed help interpreting standards, which matches wider evidence on what students say makes good feedback. By contrast, GenAI was frequently praised for convenience but questioned on accuracy, nuance, and whether it genuinely understood the task. For institutions, that distinction matters: feedback quality is not only about whether comments arrive quickly, but whether students feel confident acting on them.
The study also suggests that trust matters more as the academic stakes rise. Students were more willing to use GenAI for drafting, brainstorming, or early-stage checking than for final judgements about what would actually improve a marked piece of work. In other words, AI feedback may widen access to formative support, but it does not replace the reassurance students want from informed human judgement. That gives institutions a clearer design rule for AI feedback pilots.
For UK universities, the first implication is to stop framing AI feedback as a direct substitute for teacher feedback. A better model is a layered feedback system shaped by student voice in assessment and feedback: use GenAI for quick iteration, idea development, and low-stakes guidance, while protecting staff time for higher-value feedback that depends on subject expertise, standards, and contextual knowledge. That gives students faster support without blurring the role of teacher judgement.
Second, institutions should measure trust, not just satisfaction or usage. If you only ask whether an AI tool was helpful, you can miss the more important question of whether students would rely on it when it matters. Module evaluations, pulse surveys, and free-text prompts should distinguish between usefulness, trustworthiness, and the type of task students were trying to complete. That produces better evidence for deciding whether a pilot should expand, pause, or change course.
Third, this is exactly the kind of pattern that shows up in open comments before it becomes visible in a headline score. If students repeatedly say that AI feedback is fast but generic, or convenient but risky, universities need a consistent way to surface that theme across modules and cohorts. That is where analysing teaching evaluation comments at scale matters. Student Voice Analytics helps teams categorise recurring themes around feedback quality, trust, and usefulness, so AI pilots are judged by the student experience they create, not just by adoption figures.
If your institution is reviewing an AI feedback pilot, the next step is to look beyond the usage dashboard and read the comments at scale. Explore Student Voice Analytics to see how teams track trust and feedback quality across modules, or compare our approach with generic LLMs when governance and reproducibility matter.
Q: How should universities use Generative AI in feedback without weakening teacher feedback?
A: The paper points towards a complementary model, not an either-or choice. Use GenAI where speed and iteration matter most, such as drafting, idea development, or low-stakes formative tasks. Reserve teacher feedback for higher-stakes work that depends on disciplinary judgement, nuance, and credibility.
Q: What should institutions measure if they want to know whether AI feedback is working?
A: Measure more than uptake. Teams should ask whether students found the feedback useful, whether they trusted it, what kind of task they used it for, and whether they would act on it in a high-stakes context. Free-text responses are especially valuable here because they reveal why students trusted or distrusted a source, which gives institutions a firmer basis for expanding or reshaping a pilot.
Q: What does this mean for student voice work more broadly?
A: It reinforces that student voice is not just about collecting an opinion on a tool. Universities need to understand how students interpret support, risk, expertise, and fairness in practice. Comments about AI feedback can therefore become an early indicator of wider issues in assessment design, feedback literacy, and confidence in institutional change.
[Paper Source]: Michael Henderson, Margaret Bearman, Jennifer Chung, Tim Fawns, Simon Buckingham Shum, Kelly E. Matthews, Jimena de Mello Heredia "Comparing Generative AI and teacher feedback: student perceptions of usefulness and trustworthiness" DOI: 10.1080/02602938.2025.2502582
Request a walkthrough
See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.
UK-hosted · No public LLM APIs · Same-day turnaround
Research, regulation, and insight on student voice. Every Friday.
© Student Voice Systems Limited, All rights reserved.