How swarm learning is tearing down data silos

Swarm learning makes it possible to perform machine learning on distributed data sources while conforming to data confidentiality requirements

Like any large organisation, Hewlett-Packard Enterprise (HPE) has had to deal with data silos across its business, from sales and marketing to finance and supply chain. But to build machine learning and analytics applications that generate insights across the business would require it to combine different data sources.

Rather than build a data lake, where data might be outdated by the time analytics is performed, HPE decided a few years ago to take the data federation approach, where data is aggregated from disparate sources in a virtual database while allowing real-time access to the data.

HPE recently took its expertise in data federation to the next level when it teamed up with the German Centre for Neurodegenerative Diseases to use swarm learning in a research project to develop disease classifiers using distributed patient data from different hospitals while ensuring data confidentiality.

“Swarm learning allows each hospital to do their own machine learning, on their own patient data,” said Goh Eng Lim, senior vice-president and chief technology officer for artificial intelligence (AI) and high-performance computing at HPE. “There is no sharing of the data, but once in a while, a blockchain comes along to collect the learnings, which, in technical terms, are neural network weights and parameters.”

The research, which was published in the Nature scientific journal in May 2021, found that the learnings were similar to what would have been the outcome of performing machine learning on a combined set of data, making swarm learning feasible for democratising the use of AI, along with improved data confidentiality, privacy and data protection.

Swarm learning is part of a portfolio of capabilities that HPE is hoping to bring to organisations under the Gaia-X initiative, a federated data infrastructure project supported by more than 300 organisations in Europe and globally. It is also seen as a move by the European Union to challenge the dominance of US tech giants in the global digital economy.

HPE is a day-one member of the non-profit organisation Gaia-X Association for Data and Cloud, and contributes to the Gaia-X architecture, standards and certification. It is already working with dozens of organisations across Europe to help them get ready for decentralised data infrastructures such as Gaia-X.

Goh noted that organisations in Asia-Pacific that are already doing business in Europe, or have plans to do so, will benefit from capabilities such as swarming learning under the auspices of Gaia-X.

Read more about AI and machine learning

Citing the example of credit card companies in Europe and Asia that would not usually share proprietary customer data for commercial reasons, he said they could benefit from sharing fraud profile data based on what they know about their customers.

“They learn from their own customers a certain characteristic of what fraud is, but they know that they are not seeing it or perhaps another credit card company may have seen things that they have not seen,” said Goh.

“So they are in a dilemma. On the one hand, you can’t share customer data, but you would like to share the learnings from customer data that relate to a particular fraud profile. This is where swamp learning can come in.”

Goh added that Gaia-X also caters to other domains, such as healthcare, agriculture and energy.

But where swarm learning comes into its own is at the edge, from which data is typically sent to a centralised location, such as a public cloud, to train a machine learning model before the model is deployed at the edge. With swarm learning, training can be done at edge locations, with the insights aggregated on a macro level.

“With 50 to 100 billion devices out there, most of your data will be at the edge, relative to even your decentralised sources,” said Goh, adding that a lot of work is still needed to ensure highly distributed data sources can be linked in a consistent manner.

Read more on Artificial intelligence, automation and robotics