Thaut Images - stock.adobe.com

German diabetes institute uses graph database to connect coronavirus research

DZD, the German Federal Diabetes Research Centre, is using a Neo4j graph database to link up Covid-19 scientific research and scientists

Alexander Jarasch, head of data management and knowledge management at the DZD (Deutsches Zentrum für Diabetesforschung), the German Centre for Diabetes Research, hopes the effort that scientists, both data and medical, are currently putting into the battle against the Covid-19 coronavirus will continue into the future.

“We have to invest in healthcare, and pay nurses and doctors better,” he said. “If we survive Covid-19, we have to hope it will not come back. Can the virus change even more? We don’t know yet. There could be another virus or bacterium that could cause similar problems, including economically. So, the healthcare sector and companies have to come to work more closely together.”

Jarasch gives, as an example of collaboration, the project he is involved in through the DZD, but also involving other organisations, including database and data visualisation software suppliers Neo4j, YWorks and Linkurious.

The project replicates the Neo4j “knowledge graph” database that the DZD has been using for more than three years for research into diabetes, and applies it to coronavirus research.

Jarasch has collaborated with Martin Preusse, from data management software and services firm Kaiser and Preusse, and his DZD colleague Tim Bleimehl to set up the Covid-19 graph database. It integrates data from a range of sources and links them in a dedicated Covid-19 graph database to help researchers and scientists to find their way through the 40,000-plus publications on the disease.

“The idea is to connect information that was previously siloed, and to connect people – authors of scientific papers,” he said. “In general, in the graph community, this is a very common benefit. Here it is about understanding coronavirus by querying data and creating hypotheses.”

Connected up so far are the Covid-19 Open Research Dataset (CORD-19), the Lens Covid-19 Datasets, the Ensembl Genome Browser, the NCBI Gene Database, the Gene Ontology Resource, experimental data from clinical studies and molecular genetics, the 2019 Novel Coronavirus Covid-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE, and the United Nations World Population Prospects 2019.

The DZD is one of six federal research centres in Germany. There is a particular link between coronavirus and diabetes, said Jarasch, in that “we have evidence that diabetic patients have a higher risk of infection, and the chance of dying is higher – but we don’t know why”.

Diabetes UK explains the connection on its website: “Everyone with diabetes, including those with type 1, type 2 and gestational, is at risk of developing a severe illness if they get coronavirus, but the way it affects you can vary from person to person.

“When you have diabetes, being ill can make your blood sugar go all over the place. Your body tries to fight the illness by releasing stored glucose (sugar) into your blood stream to give you energy. But your body can’t produce insulin to cope with this, so your blood sugars rise.

“Your body is working overtime to fight the illness, making it harder to manage your diabetes. This means you’re more at risk of having serious blood sugar highs and lows, as well as longer-term problems with your eyes, feet and other areas of your body.”

Read more about data science and Covid-19

Jarasch says the DZD’s 400 or so scientists are doing similar work to coronavirus researchers in the sense that “there is a big element of the unknown”, and there is mass of publications that no scientist could possibly read manually.

There are already about 40,000 publications in the coronavirus field, and around 16,000 related patents. These cover viruses other than Covid-19, too, such as SARS.

The DZD-inspired database links these and is only a few weeks old. Jarasch said the organisation wanted to build a “knowledge graph” that enables researchers to see connections, helping them to generate hypotheses that they could not otherwise easily make. Everyone can use it, he pointed out.

Jarasch said his original involvement with graph technology was sparked by a need he had three years ago to create a metadata repository of expertise and experts across the DZD and related centres, involving more than 500 researchers and 10 university hospitals spread across Germany.

“It was obvious that everything was connected but heterogeneous on a data level and that graph technology would be the way to tackle it,” he said. “Why? Because it is easy to set up, it’s highly scalable, and it is easy to adapt and change. It’s the most intuitive way of storing highly connected data in biomedicine.”

As well as the database itself, the project makes available some visualisation tools from the likes of Linkurious, YWorks, Graphileon, GraphAware, and Neo4j’s Bloom plug-in tool.

Jarasch has been working with Neo4j for three years now, setting up an internal tool called DZDconnect. That graph database sits as a layer over relational databases linking different DZD systems and data silos.

“Creating the first data models with Neo4j was very fast,” he said. “In the first week, I was already able to connect metadata gained by our scientists in a data model, to test it and to show the added value of the graph database.”

Read more on Big data analytics