GPT Detectors and Academic Rigor

Originally written in 2023.

It takes away the drudge work,” he said. “It might take away more than that.”

-Geoffrey Hinton, Former Head of Google’s Toronto AI Lab (in reference to ChatGPT)

A recent study conducted at Stanford University has come to the worrying conclusion that GPT detectors are even more inaccurate than we might have realized. Most worrying of which is that GPT detectors often identify TOEFL (Test of English as a Foreign Language) essays as AI-generated. Around 55% of real TOEFL essays are misidentified as ChatGPT generated according to the study. Less verbose US 8th grade essays are up to 99% of the time detected as ChatGPT. Additionally, even telling ChatGPT to use more complex word choices on the TOEFL essays reduces AI misclassification to around 15%. This implies that GPT detectors use very rudimentary metrics like word choice to detect AI writing. Below is the referenced figure.

There are some flaws with this study, however. Particularly for the referenced portion of the study, it does reference 8th graders, a group that generally needs further refinement of their writing skills. Another interesting portion of the study is that ChatGPT prompts reduce AI detection rates by up to 50-60%, which means that wary use of ChatGPT makes AI writing almost impossible to detect as well.

This is worrying because of the prevalence of using ChatGPT in academia today. Faculty will have to make the decision between not trying to prevent ChatGPT usage, and risking accidentally detecting a legitimate essay as AI-written. Time will tell how these findings about GPT detectors will affect their use.

Further Reading: ‘The Godfather of AI’ Quits Google and Warns of Danger Ahead – The New York Times

Leave a Reply