GPT Detectors and Academic Rigor – Whole Grain Erudition

Originally written in 2023.

“It takes away the drudge work,” he said. “It might take away more than that.”

-Geoffrey Hinton, Former Head of Google’s Toronto AI Lab (in reference to ChatGPT)

A recent study conducted at Stanford University has come to the worrying conclusion that GPT detectors are even more inaccurate than we might have realized. Most worrying of which is that GPT detectors often identify TOEFL (Test of English as a Foreign Language) essays as AI-generated. Around 55% of real TOEFL essays are misidentified as ChatGPT generated according to the study. Less verbose US 8^th grade essays are up to 99% of the time detected as ChatGPT. Additionally, even telling ChatGPT to use more complex word choices on the TOEFL essays reduces AI misclassification to around 15%. This implies that GPT detectors use very rudimentary metrics like word choice to detect AI writing. Below is the referenced figure.

There are some flaws with this study, however. Particularly for the referenced portion of the study, it does reference 8^th graders, a group that generally needs further refinement of their writing skills. Another interesting portion of the study is that ChatGPT prompts reduce AI detection rates by up to 50-60%, which means that wary use of ChatGPT makes AI writing almost impossible to detect as well.

This is worrying because of the prevalence of using ChatGPT in academia today. Faculty will have to make the decision between not trying to prevent ChatGPT usage, and risking accidentally detecting a legitimate essay as AI-written. Time will tell how these findings about GPT detectors will affect their use.

Further Reading: ‘The Godfather of AI’ Quits Google and Warns of Danger Ahead – The New York Times

Post Views: 1

Leave a Reply Cancel reply

Related Posts

Subscription-Based Models and the Love of Convenience

Social Media and the Degeneration of the Democratic Process

Semantic Supremacy

Comedic Notes on the Dwarf Fortress Economy