Durham University upgrades cosmology supercomputer to switchless architecture with Rockport Networks

Researchers working within the Institute for Computational Cosmology at Durham University are set to reap the benefits of a network upgrade in one of its core supercomputers

Durham University’s Institute for Computational Cosmology (ICC) is upgrading to a switchless network architecture for its COSMA7 supercomputer to reduce the risk of network congestion slowing down the pace of its research into the origins of the universe.

The ICC uses a series of large supercomputers to power its space-related research, which requires complex and sophisticated simulations to be carried out so that its 50-strong team of researchers can further their knowledge of how the universe works.

The COSMA7 supercomputer is supported by the Distributed Research Using Advanced Computing (DiRAC) network, which provides IT resources and funding to supercomputing setups at universities in Cambridge, Durham, Leicester and Edinburgh.

The supercomputer also receives funding from the Exascale Computing Algorithms and Infrastructures Benefiting UK Research (ExCALIBUR), a £45.7m initiative focused on the delivery of next-generation simulation software to high-priority research fields in the UK.

During a press briefing to discuss the COSMA7 network upgrade, Alastair Basden, technical manager of the COSMA high-performance computing (HPC) cluster at Durham University, said the institution worked with research teams across the world.

“It’s a very international institution and collaborates with universities from all over the world, and what we do primarily is perform huge simulations of the universe – starting with the big bang – before propagating that forward in time to the present day, allowing us to watch the evolution of the universe during this time,” said Basden.

“We can put different physics into the simulations and we can see into things that we don’t understand. Things like dark matter, dark energy and that sort of thing.

“There are different parameters for those and we put those into the start simulation, propagate the simulation and then compare what we get in the simulation with what we observed using giant telescopes.”

To ensure the supercomputer can continue to carry out its work in an efficient and productive way, and in the wake of a successful proof of concept, the university has opted to revamp and upgrade the COSMA7’s networking architecture to switchless design using Rockport Networks’ technology.

The deployment is being funded by the DiRAC and ExCALIBUR programmes, with Rockport’s technology enabling the university to distribute the network switching function to COSMA7’s endpoint nodes, which effectively makes them the network.

This, in turn, allows layers of switches to be eliminated from the supercomputer’s infrastructure, and means the risk of congestion and network bottlenecks are reduced, allowing the university’s researchers to run their simulations more efficiently and get their hands on the data they produce more quickly.

Read more about HPC and supercomputers

Matthew Williams, CTO of Rockport Networks, said the project is indicative of how attitudes and ideas about how to tackle the problems of network congestion are changing.

“Tackling congestion has moved beyond provisioning more switches to throw bandwidth at the problem,” he said. “Sophisticated control and architecture means the customer is no longer at the mercy of the bottlenecks their network infrastructure creates.”

The ICC was introduced to Rockport Networks when the company was still in stealth mode by mutual contacts at hardware giant Dell, said Basden.

“We’re a Dell Centre of Excellence here at Durham, and Dell thought we might be interested in Rockport’s technology,” he added. “So we were put in contact with them just over a year and a half ago, and we happened to have this cluster here that we could test it out on and took it from there.”

The final deployment is taking place this week, but Basden told Computer Weekly that user feedback during the testing phase was wholly positive, with its research teams able to conduct their work with no interruptions.

“A lot people haven’t noticed [the difference], which is a very positive thing,” he said. “To them, it’s just a network and they’ve used it and they haven’t had to adjust their code, but the people who have known about the work going on behind the scenes have been impressed by it.”

As an example of how well the testing has gone, Basden points to the improvements in performance that researchers working on a large and complex smoothed particle hydrodynamics code saw during the testing phase.

That particular code uses “task-based parallelism”, so that the way it works should be unaffected by network congestion issues, but its performance also improved once Rockport’s technology was added to the stack.

“We’re always on the hunt for advanced technologies with the potential to improve the performance and reliability of the advanced computing workloads we run,” said Basden.

“Based on the results and our first experience with Rockport’s switchless architecture, we were confident in our choice to improve our exascale modelling performance – all supported by the right economics.”

Read more on Clustering for high availability and HPC