Machine-generated content: Boon to Information Professionals or the end of the world as we know it?

Advances in machine-generated content are changing scholarly publishing and Springer Nature is in the forefront of adopting its usage. What does this mean for researchers, scholarly publishing and librarians?

In December 2020, machine-generated summaries of three articles were published for the first time as part of a  In its press release, Springer Nature recognises that:

"Escalating computing power, expanding data sets, and algorithms of unprecedented sophistication have led to a massive increase in the number of journal and conference papers referring to AI in recent years. The Nature Index AI supplement draws on Nature Index data and the larger Dimensions from Digital Science database to analyse this rapidly advancing and controversial topic. For the first time, the supplement also includes summaries of research articles created using AI, and it looks more broadly at how AI is being used in scholarly publishing."

The supplement investigates many emerging applications of AI, including the detecting of fakes, drug discovery and recognising bias in facial recognition software. It also looks at how AI can be applied in the context of scientific publishing. Some possibilities are creating text from structured datasets, summarising papers, helping readers find relevant papers more easily and identifying possible alternative outlets for papers rejected from journals.

Machine-Generated Textbook

In April 2019, Springer Nature took an early step forward into the realm of machine-generated content by publishing a machine-generated book, comprising an overview about the latest research on lithium-ion batteries. It was automatically compiled by an algorithm called Beta Writer, developed in collaboration with the Applied Computational Linguistics lab of Goethe University Frankfurt/Main (Germany). The idea behind the book was to use AI technology to help researchers cope with information overload. Beta Writer selects, consumes and processes relevant publications drawn from SpringerLink. It then creates article summaries, with hyperlinked extracted quotes. It automatically creates introductions, a table of contents, and references.

Springer Nature believes that the future of scholarly publishing will include AI technologies that use algorithms, Natural Language Processing, and machine-generated content to create new types of scientific content, both journal articles and books. This is not to say that the role of human authors and researchers will diminish. Niels Peter Thomas, Managing Director Books at Springer Nature, said: “As a global publisher, it is our responsibility to take potential implications and limitations of machine-generated content into consideration, and to provide a reasonable framework for this new type of content for the future.

Henning Schoenenberger, Director Product Data & Metadata Management at Springer Nature, added: “While research articles and books written by researchers and authors will continue to play a crucial role in scientific publishing, we foresee many different content types in academic publishing in the future: from yet entirely human-created content creation to a variety of blended man-machine text generation to entirely machine-generated text. This prototype is a first important milestone we reached, and it will hopefully also initiate a public debate on the opportunities, implications, challenges and potential risks of machine-generated content in scholarly publishing.”

Not Just Springer Nature

Springer Nature is not the only publisher experimenting with AI. Frontiers, an OA publisher introduced, in July 2020, its ARIA (Artificial Intelligence Review Assistant), designed to enhance the peer review process, helping editors, reviewers and authors evaluate manuscript quality. It reads papers and makes recommendations in seconds, assessing parameters such as language quality, integrity of the figures, the possibility of plagiarism, and potential conflicts of interest.

UNSILO, a Danish AI and NLP company founded in 2011, has partnered with many scholarly publishers in the past nine years and was acquired by Cactus Communications in March 2020. It has three product lines: Evaluate that helps with manuscript preparation, screening and publication; Classify to support publishers' packaging and selling capabilities; and Recommend that boosts publishers' discovery of new and existing research while increasing website page views.

Challenges of AI Technologies Applied to Scholarly Publishing

The role of machine-generated content and other AI technologies remains in flux. There is obvious interest in using the technologies to save editors' time, enhance content discoverability, increase understanding of scientific research and bring research findings to publication faster. The assignment of metadata also benefits from AI technologies.

As in other areas, potential bias needs to be considered. Particularly this year, with its belated acknowledgement of inequality in scholarly research and publishing, coupled with the recognition that the situation must change, scrutiny of the peer review process and questions concerning underlying bias in code that perpetuates gender disparities and systemic racism must be addressed. If we can train machine-generated content to eliminate bias, the technology has a bright future. However, if that content merely relies on human-created content, replete with historical implicit bias, the promise the technology holds could be lost.