Mar 19, 2025

How we determine original student writing

This article outlines Norvalid’s approach to validating student writing and how the system enables educators to manage AI writing in assignments. For any assessment to be valid, the instructor must be confident that any submission follows the guidelines defined for the assessment; whether it allows students to cowrite with AI or not. Norvalid deploys a multi-factor approach to assist with managing AI in assessments.

Validate the student instead of detecting cheating

Plagiarism detectors (sometimes called similarity checkers) look for similar texts across historical student submissions, relevant literature and the internet, which could indicate plagiarism. AI detectors attempt to identify text authored by artificial intelligence, specifically large language models (LLMs). We believe that this approach to assessment security and academic integrity fundamentally asks the wrong questions. One should not be concerned about whether or not the content of a submission is plagiarised or authored by AI; one should be concerned about whether or not the content is authored by the student.

By validating that the submission content originates from the student (the opposite of detecting cheating) or otherwise follows the guidelines set out in the assessment, we also validate the assessment itself.

In Figure 1, we outline four different categories of academic misconduct. (Note that not all of these are necessarily valid for any given assessment; perhaps collusion is allowed or AI writing encouraged!). Plagiarism detectors are great at validating the absence of collusion or uncited copying of content from others, and they provide hard evidence of such activities. AI detectors claim to detect AI writing, but research-backed claims show how they are unreliable. AI detectors also fail to provide any evidence or explanation, which makes them unsuitable for use in disciplinary cases. You can read more about how AI detection is a flawed strategy in a previous blog post. Norvalid’s authorship validation method tackles the problem of ghostwriting by validating the student’s writing. If you can validate the student’s writing, you can also validate the absence of AI writing and plagiarism.

Screenshot 2025-03-11 at 9.20.41 AM

Figure 1: Cheating methods and solutions

A multi-factor approach

A major reason plagiarism detectors are good at documenting cited text is that they point to the exact source as evidence when a similar text is found: i.e., here is your submission, and here is Wikipedia—it’s a copy!

When changing the approach to detect the student's original writing, we cannot point to a definite piece of evidence like text-matching software can. Without obvious verbatim copy-paste, we must build a case consisting of a series of clues pointing to a firm conclusion. The more tests we do, the more reliable we can be in validating original student writing and eliminating cheating (plagiarism, ghostwriting, and unauthorised AI writing). Norvalid currently includes 4 text validation tests, each carefully designed to measure different aspects of the writing to judge its authenticity.

With multiple approaches that measure different things, the final integrity report can be related to the policy and expectations set out in the assessment. Some assessments will allow students to co-write with AI or let AI touch up their language. Norvalid enables instructors to manage the level of original student writing they expect to see in each submission.

Verified text - Linguistic analysis:

We extract the student's unique linguistic fingerprint by analysing an authentic text sample from the student. The linguistic features within student submissions are compared with the student’s text sample, which makes us able to statistically measure the likelihood of the student being the authentic author of the submitted text. The statistical method and evidence behind authorship validation stems from hundreds of peer-reviewed research papers on digital text forensics from the last three decades.

We work closely with every institution to carefully select the parameters of the linguistic analysis model to accurately measure the writing style of their specific cohort of students. This process involves tuning the analysis based on historical submissions at the institutions whose accuracy reflects the student’s linguistic background. This helps eliminate biases against students of different grade levels, students with English as a foreign language, et cetera.

Based on tens of thousands of real assignments in a test set, we pin the false positive rate of this analysis to <1% while still being able to accurately flag submissions that cannot be verified as written by the student.

Pre-requisites for the test: historical submissions from the institution to tune the analysis model and an authentic writing sample from each submitting student.

Human Text - Perplexity analysis (inverted AI detection):

LLMs are massive statistical models, and they can produce coherent text because they are incredibly good at estimating the next likely word in a sequence. In comparison, human writing is very unpredictable and “chaotic”. The predictability of a text (or lack of “chaos”) can be measured using a language model. This measure can be expressed as perplexity.

Most AI detectors are black-box classification model that determines if text is written by AI without any intuitive explanation or evidence and are inherently biased based on the training data used to produce them. Instead of classifying text as AI-written, we use the objective measure of perplexity to determine if the text is unpredictable enough to be human-written. In other words, it creates a “human detector”.

While this cannot determine which human wrote the text, it is useful when seen in conjunction with the linguistic validation step; when the student gets no linguistic match, and the perplexity analysis finds evidence of a human author, this is a strong indication of contract cheating. The granular measurement of perplexity allows us to provide a word-by-word explanation of why the text is likely human-written.

Pre-requisites for the test: None. It is important to stress that perplexity analysis should not be used in isolation.

Cloze questions - Testing writing style

The cloze test stems from research dating back to the 1950s and an ongoing EU-funded research project conducted by three European universities (https://aver.pef.mendelu.cz/). A cloze-question is an exercise where the system carefully removes a word or phrase from a sentence and prompts the student to fill in the missing word. The student is presented with sentences from their own submission and an unknown text the student has never seen before.

The blanked words and phrases are chosen so that the question does not test the student’s memory of the text, but rather their alignment with the linguistic style of the text. Preliminary results from research across Europe show strong evidence for the test’s ability to accurately validate the authorship of texts by comparing the students’ test scores on their own submissions and unknown texts.

The cloze test is automatic and conducted at the moment of submission in a time-controlled question session which is unique for every student.

Pre-requisites for the test: None.

Free text questions - Simulation of oral exam

Similar to the cloze test, the student is prompted with a set of open-ended free-text questions at the moment of submission. These are questions that pertain to the content of the submission the student just handed in and are relatively easy to answer if you have a good understanding of the content in the submission. These questions are designed to verify that the student has a good working level of the content they present, and further validates that the content is their own.

The question session is meant to mimic a viva voce, or an oral exam. However, the question session is completely automatic, unique to every student and happens at the moment of submission, directly in the learning management system (LMS).

Pre-requisites for the test: None.

Summary

Norvalid shifts the focus from detecting plagiarism or AI-generated content to validating student authorship, ensuring submissions align with assessment guidelines. Instead of relying on unreliable AI detection, Norvalid employs a multi-factor approach, including linguistic validation, perplexity analysis, cloze tests, and question sessions. These methods collectively verify that the student is the genuine author of their work, providing educators with a reliable way to manage AI use in assessments.

By integrating multiple validation techniques, Norvalid allows institutions to uphold academic integrity while adapting to evolving writing practices. This approach ensures that assessments remain fair and meaningful, whether AI assistance is permitted or not, ultimately strengthening trust in student submissions.

References

Perkins, M., Roe, J., Vu, B.H., Postma, D., Hickerson, D., McGaughran, J., Vietnam, H.Q., & Singapore, J.C. (2024b). GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education. ArXiv, abs/2403.19148.

Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S. et al. (2023). Testing of detection tools for AI-generated text. International Journal for Educational Integrity 19(26). https://doi.org/10.1007/s40979-023-00146-z

Norvalid