stock.adobe.com

How the University of Liverpool balances HPC and the cloud

The University of Liverpool has been running a hybrid HPC environment since 2017, which uses PowerEdge nodes and AWS public cloud services

To support various workloads required in academic research, the university of Liverpool runs an on-premise HPC cluster, designed by Dell EMC and Alces Flight, which offers on-demand public cloud access to Nvidia graphics processing units (GPUs) running on Amazon Web Services (AWS).

The Dell PowerEdge-based HPC cluster is being used by researchers at the university to help them drive breakthroughs in developing new materials with large-scale applications in both industry and consumer products. HPC-supported research includes enhancing the personalisation of health management, and advancing the fight against infectious diseases.

The system has been running since 2017, and has been designed to enable researchers and students to run HPC workloads anywhere and at any time, says Cliff Addison, who works in advanced research computing at the University of Liverpool.

Addison joined Liverpool in 2002, soon after it began running high performance computing systems. As part of the original tender for the new HPC system, he says: “We wanted a mechanism to support new users, and wanted to have an environment that was scalable.”

He says the attraction of the hybrid approach is that the cloud side of the HPC installation is easier for new users to use. “We are looking at having a large number of new users. A command line user interface and Linux job submissions are foreign to them, so we need to provide easier access.”

Addison believes HPC has a long way to go, in terms of ease of use, to enable any user to run their workloads without the need to learn the complexities of the HPC operating environment. But the ease with which a cloud-based environment can be deployed provides the university with a target to work towards in terms of ease of use. 

For the time being, he says Alces Flight, provides day-to-day management of the hardware platform, which leaves the internal HPC team at Liverpool free to support users.

Balancing cost and demand

Addison says the university has recognised that the cloud is not always cost-effective for workloads that are computationally intensive. “We wanted to support a heterogenous job mix,” he adds, enabling jobs to run on-premise or in AWS, depending on the cost and resource needs of the workloads being run.

“Now with cloud, we have greater experiment flexibility,” he says. “We can help stand things up and provide researchers with an environment in the cloud.”

The HPC at Liverpool has a GPU cluster which is mainly used for machine learning and molecular modelling. Addison says researchers can run their GPU workloads on a V100 node on AWS, to test the application and understand the costs. “But using the cloud over a long period of time is expensive,” he says.

There is always a trade off matching the on-premise compute capabilities with demand from researcher at the university, but Addison has seen an increase in demand for GPU. “While it is still modest, we have two full time GPUs and will purchase another,” he says.

“Last year, we were pleased with the V100 Nvida GPU AWS capabilities we gave people. The feedback was sufficiently strong that we were able to put together a business case to add our own internal GPU.”

Growing nodes, DR and containerisation

Unlike some universities which use a pay per use model for charging for HPC usage, Liverpool’s HPC’s is centrally funded. The original system, purchased in 2017, is about half-full, but Addison says the core network and infrastructure has been set up to accommodate a larger number of nodes.

In fact, he says the university will put in an order for 25 new nodes. “We bought it with room for expansion,” he says. “The limit is in terms of cooling.”

The HPC will soon be moved to a new facility which he says will offer double the cooling capacity that it currently has capacity for, enabling the system to support up to six racks’ worth of HPC nodes and provide cooling for 120 kWh. 

But given the flexibility of the HPC to run researchers’ workloads, Addison says that among the main issues he faces is ensuring they do sensible things.

“It is a very complex environment and so it is easy for users to incorrectly specify what they require from the compute resources.”

Read more about hybrid cloud environments

  • HPE’s decision to acquire supercomputing pioneer Cray for $1.3bn serves to highlight the growing importance of high-performance computing (HPC) deployments in the enterprise market.
  • HPC applications need a lot of resources, but not all enterprises can meet those requirements. However, new cloud services are emerging to address that gap in the market.

One of the opportunities Addison sees is using the cloud side of the HPC environment for business continuity. “We are very conscious about keeping services for users running,” he said. “Student services and email are high priorities.”

But he says the HPC seems to have a single point of failure. “If there is a power outage, the HPC’s not available.”

However, he says: “The cloud gives us a greater degree of flexibility than what we previously had. We can store [virtual machine] images in the cloud and bring up almost the same software environment the researchers are used to in the cloud, even though our HPC is down.”

Over the next few weeks, as it gets migrated to the new location, with improved cooling capacity, the HPC will indeed be offline and researchers at Liverpool will be using the cloud environment instead of the on-premise HPC.

Longer term, Addison says container technology is on the university’s roadmap. “Containers are becoming important for coping with the different software stacks used by the research groups. Over the next two to three years, we will use containers locally on the HPC environment and use this as a springboard to move workloads onto the cloud.”

He believes the intelligent orchestration of workloads – which is related to containerisation – will be a top priority.

Read more on Containers