Getting answers directly from research articles

The capability for scite users to ask research questions in plain language and get answers directly from the full text of research articles is the premise of scite's new "Ask a Question" feature.

The reliability of search results, particularly from web search engines such as Google, is problematic. Questions phrased in terms a scientific expert would use raises the possibility of authoritative answers, but it’s not a guarantee. With “Ask a Question”, scite  allows for plain language searching on the full text of over 32 million research articles. Questions such as “Do tanning beds increase the risk of cancer?” return relevant results backed by peer-reviewed research. Results are not summarised or AI-generated. They are snippets from the actual research papers. A prototype is currently online. 

“Ask a Question” is based on scite’s database of Citation Statements, which are the sentences where references are used in the text of the articles. That database now contains 1.2 billion citation statements. An explanation of the data is on scite’s blog.

The approach uses open-domain extractive question answering based on a large language model from Hugging Face   .

According to scite, the system works like this:

  • We process your question by removing stop words and punctuation to form a query.
  • We retrieve the top 200 results from elasticsearch using that query over our 1.2bn citation statements and 48 million abstracts.
  • We rerank the results with the original question using a cross-encoder trained on MS-MARCO (we use 'cross-encoder/ms-marco-MiniLM-L-12-v2' available on sentence transformers )
  • We use a page length of 20 results and run our extractive question answering model trained on squad2, natural questions, and bioasq (see our model for details).
  • We return the answers to you!

For a deep dive into how it works on a technical level, check out our blog post

“Ask a Question” is a work in progress. Thus, scite has not written up a formal announcement and is in the process of developing a benchmark for assessing end-to-end performance. Although not a fully finished feature, “Ask a Question” holds promise for researchers and information professionals worldwide.