Ai2 introduces Ai2 ScholarQA

Ai2 ScholarQA is an experimental tool for researchers who need to compare and summarize multiple papers to understand the complex relationships among them when doing literature reviews.


Literature reviews take up a lot of time for researchers. With table comparisons, expandable sections for subtopics, and citations with paper excerpts for verification, ScholarQA helps researchers get more in-depth, detailed, and contextual answers.

Ai2 ScholarQA follows a RAG-based, multi-step prompting workflow using a state-of-the-art closed model (Claude Sonnet 3.5). It relies on a corpus of open-access papers (e.g., arXiv).

Corpus and the Search Index

For retrieval, the index is a Vespa cluster that currently holds about 8 million academic papers from various fields of study, such as computer science, medicine, environmental science, and biology. This index is updated weekly and uses Open S2ORC as the inclusion criteria for open access papers. When we search this index, we use a combination of BM25 and dense embeddings to score snippets extracted from full-text papers.

Given the corpus, ScholarQA will be most useful for researchers in fields with most papers available on arXiv.

If you have a Semantic Scholar API key, the full-text search feature is available to you now. If you have an academic-affiliated email address in certain countries, you are eligible to request a key and access the feature.

Section Planning and Generation

Ai2 ScholarQA is meant to satisfy literature searches that require insights from multiple relevant documents, and synthesize those insights into a comprehensive report. After receiving a query, the system first queries the index for the top k passages. These passages are further re-ranked with a pretrained transformer model and the top 50 candidates are retained for further processing. The answer generation is a 3-step process driven by prompts to an LLM: Quote extraction, Answer outline and clustering, and Report generation.

For schema generation, Ai2 ScholarQA represents input papers using their titles and abstracts as well as including the initial user queries and generated section in the context to represent user intent.

Learnings and Next Steps

Ai2 ScholarQA is an experimental solution to help researchers conduct literature reviews more efficiently by providing more in-depth answers. It is an evidence-first pipeline, where the model focuses on writing an answer built around evidence, rather than writing an answer and then trying to find evidence. It's not perfect, and is a work in progress. Ai2 ScholarQA is a joint project between Ai2 and students from the University of Washington and the Korea Advanced Institute of Science & Technology (KAIST).

A full explanation, with examples, is at the Ai2 blog.