Detecting plagiarism in AI-generated text

ChatGPT has taken the world by storm and excited the imaginations of its millions of users, who find innovative applications from which ChatGPT learns. Educators and librarians worry that text generated by ChatGPT and similar programs will encourage plagiarism and make it more difficult to detect.


ChatGPT, introduced by OpenAI in late November 2022 as a free service, uses the large language model GPT-3.5 to create AI-generated text that closely mimics what a human would write. This could be short stories, poems, song lyrics, answers to questions, essays, research papers, and even computer code. It is trained on text from the internet and optimized for dialogue though Reinforcement Learning with Human Feedback. ChatGPT isn’t the only program out there, but it has garnered the most attention.

Although the results are often spectacularly good and could easily be attributed to a human author, they can also fail. The training set, enormous as it is, ends in 2021. It could, for example, write a brief essay about soccer star Pelé but wouldn’t note his December 2022 death at the age of 82. As far as ChatGPT knows, Pelé is still alive. When asked if Pelé is dead, ChatGPT replied with both misinformation and good advice about fact checking: “Pelé is still alive and currently 81 years old. My knowledge cutoff is 2021, so I do not have information about events that have occurred after that time. It is important to fact-check information before sharing it, to ensure accuracy and avoid spreading misinformation.”

For librarians, educators, and researchers, it’s the possibility of student plagiarism and cheating that caught their attention. Is an article created by ChatGPT legitimate research? Is it plagiarism? Is it cheating? To submit a paper or an essay written by a computer program rather than the student, without attribution is, if not technically plagiarism, definitely deceptive and antithetical to the purpose of education. This raises the question of how to know if a course assignment was written by ChatGPT rather than the student. The GPT Detector is one possibility, but it is optimized for an earlier version, GPT-2. 

Another comes from Packback announced the upcoming release of a first-of-its-kind "AI-generated content detection" feature into its fast-growing suite of writing-oriented curriculum products. For Spring 2023 courses, Packback will have the ability to detect and flag AI-generated content embedded into its student-led discussion platform, Packback Questions, and its essay writing and research coaching platform,

A learning platform for improving writing, critical thinking, and motivation skills, Packback uses AI to coach students to build their critical thinking and writing skills. Using its unique instructional AI, Packback provides students line-by-line feedback on the efficacy and clarity of their writing alongside explanations of why they received a given piece of feedback to teach them how to improve their work. The detection of AI-generated content expands and builds upon Packback's existing automatic content moderation capabilities which include plagiarism detection, profanity detection, source quality detection, and more. The training set includes the 87 million questions and responses from its core writing support program.

Read the full press release here