Researchers: For Your Eyes Only

olu-eletu-13086

Experts do research. Experts generate research papers. Those papers are referenced by other researchers. And so on. Sometimes you can get lost in the thick and sticky wickets of peer-reviewed journals searching for an arcane piece of information.

For those who live in the world of research and journal articles, you put forth painstaking and time-consuming care in finding the right papers with the latest and most relevant material to support your case. Paul Allen, the lesser-known Microsoft founder, recently put some of his considerable resources toward helping researchers enjoy a better AI-enhanced search engine. Launched in 2015, Semantic  Scholar was originally populated with 3 million computer science papers. Today, it boasts more than 40 million papers, many in the biomedical and environmental fields as well.

While Semantic Scholar has been available for two years, it caught my attention when I was doing some research for a biopharmaceutical company a few weeks ago. The Economist October 19, 2017 edition included a story that mentioned an updated version had just been launched that added 26 million biomedical research papers to its existing 12 million. I jumped on the site to test it out. What makes Semantic Scholar special, and different from other search engines like Google Scholar, is that it uses AI to search and categorize articles relevant to your specific needs rather than relying simply on rankings or citations in other papers based on your search terms.

I was looking for papers that combined two topics not commonly addressed in the same article – on the business and the medicine of a particular disease. The search netted me some good hits that met those unique criteria and sent me to the same reliable publication sources I would normally search. Overall, I had a good experience and recommend it.

From The Economist description:

Like most AI systems, the new Semantic Scholar relies on a neural network – a computer architecture inspired by the way real neurons connect to each other. Neural networks are able to learn tasks by trial-and-error. Miss [Marie] Hagman’s team [the project’s leader] wished to bend their network to the task of recognising [sic] scientific phrases and their contexts…”

To do this Ms Hagman asked four medical researchers to annotate ten entire research papers and 67 isolated abstracts, which were to serve as fodder for the training process. The annotators read the papers and abstracts, and highlighted within them a total of about 7,000 medical ‘topics’ (particular diseases, particular genes, particular proteins and so on). Between these topics they identified some 2,000 pairwise relationships, such as a particular gene encoding a particular protein, or being associated with a particular disease.

That done, they fed the results into the neural network, which, based on the context of a topic (ie, the words surrounding it in the various places it appears) and the pairwise relationships identified by the researchers, was able to find new topics and relationships to add to the hoard. The team then improved the network’s performance by presenting it with previously unseen papers to annotate, and correcting its suggestions until it was able, without help, to annotate such papers correctly. It can now identify 368,071 topics (mentioned a total of 236,979,862 times) and 6,756,863 relationships in the 38m papers available to it.

The upshot is that both scholars and laymen can pull out clutches of papers on particular topics from the database, with a reasonable presumption that those papers are the ones most pertinent to their needs.”

In my experience, that claim is true. For those whose job includes research, this tool is well worth investigating.

Have you had experience with Semantic Scholar or other search engines? We’d like to hear about it in the comment section below.

Photo by Olu Eletu on Unsplash

 

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *