alex_aldo - Fotolia

Coronavirus: Mobilising data science

Policy-makers are calling on the global data science community to develop data models that can can help them better understand the Covid-19 transmission rate

There has been a global collaboration across the IT sector to enable access to data, which will be critical as the world combats the Covid-19 coronavirus. 

In a joint effort with the White House, Microsoft and Google’s Kaggle data community platform have begun working with the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET) and the National Library of Medicine (NLM) at the National Institutes of Health to support the release of the Covid-19 Open Research Dataset (Cord-19).

The Cord-19 resource offers more than 44,000 scholarly articles, including over 29,000 with full text, about Covid-19, Sars-CoV-2 and related coronaviruses. “This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease,” Kaggle wrote in a post.

“There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.”

The data science community is being asked to work on a number of tasks based on this dataset. For each of these, Kaggle said it is sponsoring a $1,000 prize to anyone who submits a project that best meets the task’s criteria.

Michael Kratsios, US chief technology officer at the White House, called on the research community to collaborate to combat coronavirus. “Decisive action from America’s science and technology enterprise is critical to prevent, detect, treat and develop solutions to Covid-19,” he said.

In the UK, NHSX and Public, the govtech startup accelerator, announced Techforce19 and £25,000 funding for companies that can develop software to support a number of key requirements for the NHS and social services.

These include: providing remote social care, such as locating and matching qualified carers to those in need and managing and delivering care in care homes; optimising the care and volunteer sector through the development of tools to recruit, train and coordinate local volunteers into clinical and non-clinical workers; and improving mental health support, such as by making it easier to discover and deliver mental health services and support, or by developing tools to support self-management of mental health and wellbeing.

Important role for tech

Matthew Gould, chief executive of NHSX, said: “Tech can play an important role in helping the country deal with the challenges created by coronavirus. This competition is focused on the problems created by isolation, which lend themselves to digital solutions.

“It will allow NHSX to accelerate the development of those solutions, so within weeks they can help those in isolation suffering from loneliness, mental health issues and other problems.”

Meanwhile, China’s public cloud platform, Alibaba Cloud, has offered data collected from its Epidemic Prediction Solution model to medical researchers around the world. The algorithm is based on publicly available data in China, such as flight information, number of new cases, number of confirmed cases, number of close contacts and the contact date, and number of people under quarantine to provide estimates of size, peak time and duration of the epidemic, as well as the spreading trends. 

According to Alibaba, the algorithm, which has been tested on data from 31 Chinese provinces, averaged 98% accuracy. “It can serve as a reference to policymakers and medical researchers on prevention and control measures, medical resource allocation and travel advisories,” said Alibaba Cloud.

“Policy makers and medical researchers can use Epidemic Prediction Solution to make data-based informed decisions related to prevention and control measures, medical resource allocation, and travel advisories.”

Data scientists are being called into action as researchers draw on such data sources to help them understand the nature of coronavirus, how it spreads, and the effectiveness of government controls.

But as Harvinder Atwal, chief data officer at MoneySupermarket, points out, every country varies in how its population reacts to measures imposed by the government to control the spread of the virus. Each will have different demographics, underlying health issues and climate that all affect any proposed data model that can predict the coronavirus infection rate.

Atwal said another challenge for data scientists is that a lot of data is unstructured and not in a machine-readable format.

Read more about tech sector response to coronavirus

However, he believes that some of this data, like compliance, can help the fight against coronavirus. “You can get data on what businesses are open,” he said.

Atwal said data can also be used to see if people are isolating themselves or grouping in crowds. “Lot of countries use mobile data to track people’s movements by looking at who has logged into their phones,” he added.

For Atwal, such data can be used to assess how effective government measures are in stopping people from gathering in public. “You can also look at Google trend data on fear-based searches,” he said. This can indicate how concerned the public is about the threat of the virus. 

There is unlikely to be a single data model that defines the behaviour of coronavirus in terms of its transmission rate and how policy decisions made by the government will influence the infection rate. The models within Alibaba’s Epidemic Prediction Solution are weighted for pessimistic, neutral and optimistic outcomes for the spread of the virus.

Atwal said he expects several models to be developed that effectively crowd-source possible scenarios. “Even if the margin for error is quite high, these models can guide policy and show the potential impact that coronavirus will have on the NHS,” he said.

Given that coronavirus testing is not being conducted on the UK public at large, the virus transmission could be higher than published estimates. Even if the margin of  error is high, the data models being built using all the datasets now publicly available will provide helpful insights to guide policy decisions.

Read more on Big data analytics