Gorodenkoff - stock.adobe.com

How long until cloud becomes the preferred environment to run HPC workloads?

The high-performance computing market is growing, but enterprises that have already made large investments in on-premise HPC will need to weigh up whether a move to the cloud would be worth their while

Enterprise spending on high-performance computing (HPC) has skyrocketed over the past decade. According to Intersect360 Research, a 10-year growth streak from 2009 to 2019 culminated in global HPC market revenue reaching $39bn in 2019.

The Covid-19 pandemic is expected to have stunted the market’s growth somewhat, with projections suggesting spending will have contracted by 3.7% once the final figures for 2020 are in.

But according to MarketsandMarkets, this is not expected to last. It predicts the market will grow to $49.4bn by 2025, at a compound annual growth rate (CAGR) of 5.5%, fuelled by increasing demand from enterprises for more efficient and scalable computing resources. It also says there is an emerging need for high-speed data processing with accuracy, and highlights the adoption of HPC in the cloud as a driver for uptake.

It remains to be seen, however, whether enterprises are investing in cloud-based or on-premise HPC, in terms of both upgrades and migrations.

Companies thinking about moving their HPC workloads to the cloud may have concerns similar to those that enterprises had 10 years ago when weighing up whether or not to move their more conventional workloads off-premise – especially if they had already invested heavily in building out their on-premise HPC infrastructure estate.

The difference in cost for on-premise HPC compared with on-premise IT infrastructure is huge. Enterprises could spend up to $200m on their HPC infrastructure, and regular updates mean costs can continue to mount after the initial deployment.

The dilemma they face is whether they should continue investing in an environment on which they have already spent a lot of money, or make use of cloud computing in conjunction with their existing environment. Or should they switch to a fully cloud-enabled HPC environment?

Cloud switching costs too high for AstraZeneca

For pharmaceutical giant AstraZeneca, switching to a cloud-based HPC system is off the cards for the foreseeable future for cost reasons, despite the company tapping into cloud for some of its high-performance workloads.

“Our HPC platforms are on-premise, but we do use HPC in the public cloud, especially if we’ve got short, sharp workloads that need GPU [graphics processing unit] or FGPA [field programmable gate array] and so we do use those kinds of workload in the public cloud as well, simply because sometimes you need a bigger scale than we are able to offer within our own locations,” says Scott Hunter, the company’s global infrastructure services director.

In other words, AstraZeneca has opted for “augmenting on-premise with cloud” in its HPC environment. It extends its Kubernetes platform out to various public clouds to do this, says Hunter, which gives it consistency while allowing it to take advantage of what the cloud has to offer.

“If we wanted to run those in GPUs on A100s from Nvidia, then we’d use Google or Oracle for that, and if we wanted to use FPGA, then it would be [Microsoft] Azure or AWS [Amazon Web Services], and if we wanted to get the ultimate scale, we would launch all-in on Azure because even though its capability might not be the fastest, it seems to have the largest amount of available GPU,” says Hunter.

A recent HPC user survey by Intersect360 declared AWS the top-rated cloud computing platform in every category but one – including technical and operational impression, future outlook and likeliness to use in two years.

However, AstraZeneca’s intention is not to move to a cloud-based HPC, purely for cost reasons.

“On-premise is between five and seven times cheaper than doing a similar type of activity in the cloud,” says Hunter.

The reason why there is such a high cost associated with cloud-based HPC for AstraZeneca is that it works with a lot of double-precision chemistry workloads, which tend to be operational 24 hours a day, 365 days a year.

“That means that unless we get a cracking deal from one of the major public cloud vendors or Oracle, it makes sense to run on-premise because of the cost,” says Hunter.

Formula One team McLaren Racing is also currently using the model of on-premise HPC with cloud usage for certain situations.

“Cloud is pretty important, but performance and latency are equally important, so while we spin up vast amounts of services in well-known cloud provider environments, we have to rely on HPC systems that are on-premise, so there’s a balance for us between both,” says Ed Green, principal digital architect at McLaren Racing.

The suggestion is that high-performance workloads will work more effectively running on the on-premise systems built for them. But that doesn’t mean McLaren Racing will not look at using more cloud-based HPC, particularly when cars are on the track.

“The workloads are fairly static during a race week and then the minute the cars come out on the track, you start seeing a lot more data coming through and workloads start to increase as well,” says Green. “So it can be really interesting going forward looking at new models in those areas.”

From on-premise to cloud-based HPC

While many CIOs are content with their on-premise HPC environment, this is not necessarily the case in every organisation.

Elaine Neeson, CIO at the Children’s Cancer Institute in Australia, says that when the organisation hired a biomathematician, he required the use of high-performance computing. Neeson and her team decided to build this on-premise with Dell.

“We realised that it was never going to work on-premise as we wanted it to, and we would have to keep sinking quite a lot of funds into it, which wasn’t really feasible,” she says.

As the institute had already been using Microsoft Azure for virtualisation and Office 365, it decided to pilot the use of Azure for the high-performance workloads, with a particular focus on ensuring that costs would not spiral as a result of switching to the cloud.

“We didn’t want the team spinning up huge resources and having them on 24/7 if they weren’t using them because the cost would have just blown up,” says Neeson. “So the six-month pilot gave us a taste of how we could use the cloud efficiently.”

The biomathematician’s team now carries out 90% of its work in the cloud.

Read more about HPC deployments

Migrating from Dell’s on-premise HPC to Microsoft Azure was not an easy switch to make – and a big reason for that is internal resourcing and skills, which also hindered the institute with the on-premise HPC initially.

“We’re a Windows shop and most of our laptops are Windows, although our researchers use Macs, but HPC is all run off Linux and that was a huge challenge for us – we still struggle to get those specialised skills internally,” says Neeson.

She puts this down to the competitive market in Australia, where the institute would have to compete with the likes of Macquarie Bank and other huge corporates for skilled staff.

“Linux was one of the biggest upskilling programmes we had to do internally because we couldn’t find a skilled Linux person to come in,” she says. “The cloud actually made that a little bit easier because it’s all very much plug and play in a lot of regards, so moving to get off-premise did have a lot of advantages.”

The Children’s Cancer Institute still has not turned off the on-premise HPC, but its end of life is in February 2021.

“At the moment, there is still a little bit of work the researchers need to do on a couple of the analysis pipelines they use to enable them to be used in the cloud,” says Neeson. “So they’re working through that now and until they get that switched over, we will keep on-premise as a backup.”

Neeson acknowledges that she pushed for her team to work quickly to migrate to cloud-based HPC, but this is not always possible for other enterprises, particularly those that have invested even more into on-premise HPC, or those with far bigger workloads.

Cloud-based HPC accelerating

It is clear the overall HPC market has plenty of room to grow – and is being fuelled by demand for both on-premise and cloud deployments.

A report by Hyperian Research says the on-premise HPC market was worth $24bn in 2020 and will grow at 7% a year until 2024, while the HPC cloud market was worth $4.3bn in 2020 and will grow at 17% a year until 2024.

This suggests that, over time, the HPC cloud market will overtake the on-premise HPC market, but that shift will take time.

Read more on Datacentre systems management