AI detectors catch many LLM-assisted essays, but privacy and false positives remain major risks

Updated Apr 02, 2026

At Student Voice AI, we pay close attention to what students say when universities introduce new assessment technologies. That is why Qinghao Guan and Yangxi Han's recent paper in Innovations in Education and Teaching International, "From AI to authorship: Exploring the use of LLM detection tools for calling on "originality" of students in academic environments", matters for UK teams collecting module feedback, misconduct reflections, and broader student comments about Generative AI. The study combines survey evidence with detector testing to show that institutions cannot treat AI detection as a purely technical fix. Students may see value in these tools, but trust still turns on privacy, fairness, and the risk of false accusation.

Context and research question

Universities are under growing pressure to respond to AI-assisted writing without making assessment feel arbitrary or punitive. That pressure is easy to understand. Academic teams want to protect originality, quality teams want consistent processes, and students want clearer boundaries about what is allowed. But those goals can quickly collide if an institution deploys detection tools before understanding how students experience them.

Guan and Han address that tension through a mixed design. They surveyed 156 STEM students on their ethical awareness, experiences with AI tools, and views on originality, privacy, and fairness. They then tested ChatGPT-4o as a detector on 156 essays, split evenly between human-written submissions and essays produced with ChatGPT assistance. The setting is a second-language writing course rather than a typical UK module evaluation environment, but the core policy question transfers directly: how should universities use AI detection tools when students may see both benefits and risks in the same system?

Key findings

Students were not uniformly hostile to AI detection tools. The survey data suggests many respondents saw these tools as potentially useful, especially when framed as support for originality rather than only as a policing mechanism. Around 62.82% were willing to use AI tools in academic writing because they believed those tools could enhance creativity, and 58.97% felt AI tools boosted their confidence in writing. That matters because it shows student opinion is more ambivalent than the usual "for or against AI" debate suggests.

At the same time, privacy concerns were substantial and cannot be treated as a secondary implementation detail. About 65.38% of respondents were worried about data privacy, even while many were willing to engage with the technology. For UK higher education teams, that is a crucial warning. If students believe detector systems expose their data or can be misused, the resulting distrust will shape how they interpret assessment processes and whether they view integrity systems as legitimate.

The detector performance figures are the sharpest practical finding in the paper. ChatGPT-4o correctly identified 63 of 78 AI-assisted essays, an 80.8% hit rate, but it performed far worse on genuinely human-written work. Only 24 of 78 human-written essays were correctly recognised as human-authored, while 53 were misclassified as AI-generated. In other words, the model was much better at spotting its own outputs than at protecting innocent students from false suspicion.

"Relying solely on AI detection tools for assessing student work may not be sufficient or fair."

That sentence captures the study's most important implication. Direct experience with AI tools appeared to matter more than formal ethical knowledge alone, and students' views of fairness were tied to their views of integrity and creativity. The paper therefore pushes institutions towards a more balanced position: detection can be part of academic integrity practice, but only if it sits within a wider framework of human judgement, transparent policy, and student communication. The authors go further and suggest that detection should often be treated as a teaching opportunity, for example by prompting discussion, revision, or resubmission, rather than as automatic proof of misconduct.

Practical implications

For UK universities, the first implication is operational. AI detectors should not be used as stand-alone adjudicators in misconduct or quality processes. A tool that misclassifies most human-written essays in a controlled study is not robust enough to justify automatic escalation. Where detectors are used, they should trigger review, not verdicts, and students should have a clear opportunity to explain process, drafting history, and acceptable AI use.

The second implication is methodological. Institutions should collect student voice on AI policy in a more precise way than a single satisfaction question allows. Ask separately about usefulness, fairness, privacy, confidence, and willingness to use the tool. Those distinctions matter because students may welcome assistance or originality checks in principle while still distrusting the process that surrounds them.

The third implication connects directly to how free-text evidence is used. If universities deploy AI detectors across modules or programmes, they should monitor open comments for recurring themes such as false accusation risk, privacy concern, inconsistent staff messaging, and loss of trust in assessment. This is where Student Voice Analytics fits naturally. Analysing those comments at scale helps institutions see whether AI policy is being experienced as supportive, confusing, or punitive, and whether that pattern differs across subjects or student groups.

FAQ

Q: How should a university introduce AI detection tools without damaging student trust?

A: Start with policy clarity and due process, not software settings. Students need to know what kinds of AI use are permitted, what the detector can and cannot do, who reviews flagged work, and how they can respond if their work is questioned. Institutions should also explain what data is processed and how privacy is protected. Without that transparency, even a technically capable tool will generate mistrust.

Q: What should institutions make of the accuracy figures in this study?

A: They should read them as a warning against overclaiming. The tool identified AI-assisted essays reasonably well in this specific writing-course context, but it was much weaker at recognising genuinely human work. That means detector output is not reliable enough to function as proof on its own. It should be treated as one weak signal within a larger evidential process, and the study's own limits also matter: the sample was 156 STEM students in a particular programme, with a male-dominated cohort and a second-language writing setting.

Q: What does this change about student voice work in higher education?

A: It broadens it. Student voice on AI is not only about whether students "like" a tool. Universities need to understand whether students think the process is fair, whether they trust staff to interpret flags sensibly, and whether privacy concerns are undermining confidence in assessment. Open-text comments, pulse surveys, misconduct reflections, and module feedback can all surface those concerns earlier than a headline score will.

References

[Paper Source]: Qinghao Guan and Yangxi Han "From AI to authorship: Exploring the use of LLM detection tools for calling on "originality" of students in academic environments" DOI: 10.1080/14703297.2025.2511062

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

  • All-comment coverage with HE-tuned taxonomy and sentiment.
  • Versioned outputs with TEF-ready reporting.
  • Benchmarks and BI-ready exports for boards and Senate.
Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround

Related Entries

The Student Voice Weekly

Research, regulation, and insight on student voice. Every Friday.

© Student Voice Systems Limited, All rights reserved.