The emergence of AI-written materials in scholarly articles erodes trust in science

Information gathered from ChatGPT and other generative AI chatbots have been infiltrating the scholarly literature, according to several accounts appearing in the popular press, which is worrisome not only for scholars and librarians but also for the wider world. Distrust of scientific research has enormous implications. How bad is the problem? Marydee Ojala investigates.

In August 2023, WIRED magazine published an article by Amanda Hoover, titled “Use of AI Is Seeping Into Academic Journals—and It’s Proving Difficult to Detect”    

Hoover writes, “peer-reviewed academic journals are grappling with submissions in which the authors may have used generative AI to write outlines, drafts, or even entire papers, but failed to make the AI use clear.” She characterizes publishers’ attempts to deal with the problem as a “patchwork approach” and points out there is no foolproof way to ascertain if a chatbot was involved in the writing process, although in the intervening 7 months, it appears, from scholarly publishers’ own accounts, that they are indeed taking steps to weed out AI-written information, albeit not always successfully.

But wait. More recently, on 19 March 2024, Popular Science published Mack Degeurin’s article, “AI-generated nonsense is leaking into scientific journals: Text outputs from large language models are littering paper mills—and even some peer-reviewed publications”.   

He cites an 18 March 2024 report from 404Media by Emanuel Maibert, “Scientific Journals Are Publishing Papers With AI-Generated Text”. Maibert went to Google Scholar, searched for “As of my last knowledge update”, and was shocked to find 115 papers with the phrase. As with similar phrases, that one is a dead giveaway that ChatGPT is being quoted. Maibert acknowledges that the majority of articles he found probably came from paper mills, not peer-reviewed, reputable scholarly publishers. Still, the phrase appears in several papers published by academic journals.

Replicating search results

To see if I could replicate Maibert’s results, I did the same search on Google Scholar on 27 March 2024, but date limited it to “Since 2023”, since ChatGPT launched only in late November 2022. I got 167 results. Taking a close look at the first 50, it’s clear that some of the titles legitimately used the phrase since the authors were writing about ChatGPT and, in some instances, reproducing the conversation they had with ChatGPT. Sample titles include “An Evaluation of ChatGPT and Bard in the Context of Biological Knowledge Retrieval”, “PACuna: Automated Fine-Tuning of Language Models for Particle Accelerators”, and “May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients’ questions: an evidence-controlled analysis”. Of the 50 papers I looked at, I put 15 in this category, which is 30% that we, as information professionals and researchers, need not worry about.

Most of the other 70% relied on ChatGPT for either factual information or as a writing assistant. The factual information part is deeply concerning. When authors believe that ChatGPT can accurately relay statistics on the percentage of India’s population by age group, the total population of Pakistan, the amount of global sunflower production, or the number of people suffering from diabetes, there’s a clear problem with authenticity. Nor should ChatGPT be considered a reliable source for naming notable soccer players, giving foreign exchange rates, correlating quantum entanglement and longitudinal scalar waves, or documenting diagnostic criteria.

Two citations that used the phrase were a bit ambiguous as to whether the authors had used ChatGPT or were referring to their own previous writings on the topic. When I read the full text, one of those almost definitely talks of updating the author’s own research, while the other remained ambiguous.

In one of the Scholar search results, a preprint published in 2023 by EasyChair, the paragraph beginning with “As of my last knowledge update in September 2021” was a footnote attributing the information to an IEEE international conference held in 2022. The conference is real, not hallucinated, but checking the proceedings in IEEE Xplore revealed the ChatGPT phrase was not there. I had expected that to be the case but was also relieved that IEEE was not fooled by an author’s use of ChatGPT.

Opportunities for information literacy education

On the one hand, the notion of AI-written scholarly literature and the inclusion of possibly inaccurate backup data is conducive to distrusting science and dismissing scholarly research as useless. On the other hand, it presents librarians and information professionals with some golden opportunities for information literacy education.

For starters, it lets us explain the difference between Google Scholar, which is not a curated database of vetted scholarly information, and IEEE Xplore, Web of Science, and the like, which do contain information with editorial oversight. It also allows us to put the issue of AI-written and sourced data into perspective. Although 167 articles since 2023 may seem like a large number, when compared to the total of scholarly articles published during the same time frame, it’s a drop in the bucket, sufficiently miniscule as to be statistically insignificant. Of course, 167 is not the sum total of potential AI-written and consulted articles, since that search used only one of the potential phrases used by generative AI chatbots. It still pales in comparison to the total number of articles.

For scholarly communication librarians, this type of a search and the articles appearing in both scholarly literature and the more general press should provide rationales for warning researchers about their use of GenAI for research and writing. As object lessons about what not to do, they serve an important purpose.

Delving into the statistical, backup and explanatory data provided by ChatGPT in the 167 items presents an information literacy instruction gold mine and a perfect opportunity to explain the SIFT method for evaluating information first devised by Michael Caulfield at the University of Washington. SIFT stands for Stop, Investigate, Find, and Trace. The important element in the case of ChatGPT supplied data is to trace the information back to its original source or find another source to verify the information. Is it true that 65% of India’s population is under 35? Does or does not Australia classify loot boxes as gambling? Was there a survey about airline pilots and fatigue?

Send students on an explanatory journey to determine the truth of these statements and you’re on your way to having information literate graduates. With luck, they will also be well positioned to combat the distrust in science that many see as a growing, and potentially destructive, phenomenon.