canjoena - stock.adobe.com

Evolution in action: How datacentre hardware is moving to meet tomorrow’s tech challenges

The datacentre industry is in a state of flux, as new and emerging tech trends highlight hardware and performance shortcomings in legacy sites, prompting operators to rethink how they kit out their facilities

Datacentres are feeling the pressure from the massive growth in cloud services to emerging trends, such as the internet of things (IoT), which is driving demand for vast amounts of data storage as well as greater processing power to handle it all.

A previous article discussed how various factors are affecting the scale and location of datacentres, plus the ways in which these are being architected and constructed. But technology is also reshaping the infrastructure in datacentres to better meet the challenges of and emerging workloads.

One example of this is hyperconverged infrastructure (HCI), which delivers compute and storage in a single appliance-like node that is designed to serve as a building block.

It developed as an answer to the problems organisations were having with operating virtual machines across a traditional infrastructure of servers and SAN storage arrays, which reportedly led to IT staff becoming bogged down in configuring and maintaining everything.

Because HCI combines compute and storage, users no longer need to provision separate storage hardware. Instead, a software layer creates a shared pool of storage across all the nodes in an HCI cluster. But the chief appeal of HCI is the management layer that handles all of this and takes care of many of the onerous housekeeping processes automatically.

HCI does have its downsides, however. Because nodes typically have a fixed amount of compute and storage, scaling by adding more nodes may lead to some resources being under-utilised. And HCI can tie the user into a proprietary stack for their compute infrastructure.

Nevertheless, it represents a growing part of the server market, with suppliers such as Nutanix, Dell EMC, HPE and NetApp all seeing increasing sales of HCI kit over the past several years.

From hyper-converged to composable hardware

But what if the required system hardware for a specific workload could be assembled at runtime from a pool of available resources instead? This is the promise of composable infrastructure, which aims to disaggregate resources such as processors, memory and storage so they can be combined as specified in a template to meet the exact needs of any application.

HPE was to market composable infrastructure with its Synergy platform launched in 2015, although others are now getting in on the act, such as DriveScale and Dell EMC with its PowerEdge MX platform.

HPE’s Synergy uses a 10U rack-mount enclosure that can be filled with a mixture of different compute and storage modules. The storage module has space for 40 2.5in drives, which can be configured as direct-attached storage for any of the nodes, or as a SAN or software-defined storage, connecting to the nodes via a 12Gbps SAS fabric.

The problem faced by all these platforms is that main memory is directly attached to the CPU with current x86 processors, meaning it cannot easily be decoupled to allow it to be allocated separately.

This means that current composable systems are effectively just a milestone on the road to fully composable infrastructure. Such a future system would theoretically see CPUs and memory interconnected through some super-fast interconnect fabric that would allow one or more CPUs to be grouped together with as much memory as the application requires to build a kind of software-defined server.

Experimentation in action

Perhaps the closest thing we have to such a system at present is the EU-funded dReDBox project, part of the Horizon 2020 Research and Innovation programme. A demonstration system used separate memory and compute “bricks” (plus accelerator bricks based on GPUs or FPGAs) interconnected by a switch matrix.

Another example was HPE’s experimental The Machine. This was built from compute nodes containing a CPU and memory, but instead of being connected directly together, the CPU and memory were connected through a switch chip that also linked to other nodes via a memory fabric.

That memory fabric was intended to be Gen-Z, a high-speed interconnect using silicon photonics being developed by a consortium including HPE. But this has yet to be used in any shipping products, and the lack of involvement by Intel casts doubts over whether it will ever feature in mainstream servers.

Meanwhile, existing interconnect technology is being pushed faster. Looking at the high performance computing (HPC) world, we can see that the most powerful systems are converging on interconnects based on one of two technologies: InfiniBand or Ethernet.

Intel’s Omni-Path is derived from InfiniBand, for example, while Cray’s Slingshot that will feature in the exascale supercomputers boasts 200Gbps links and is compatible with Ethernet, but with enhancements such as adaptive routing features to avoid network congestion.

InfiniBand is not generally found in enterprise environments. Ethernet, however, is probably the most widely used datacentre interconnect and increasingly being co-opted for other purposes as well, such as serving as a storage fabric (more of which later).

Connecting to the future

Ethernet is already at the point where 100Gbps networking is becoming common in the enterprise, driven by the greater volumes of east-west traffic inside the datacentre generated by modern applications.

This has also led to some 400Gbps networking starting to be deployed for the network backbone. Meanwhile, Ethernet suppliers and the standards bodies are already looking to extend Ethernet to 800Gbps and 1.6Tbps in the near future, as a sign of where we should expect datacentre networks to be in the next seven to 10 years.

At these kind of speeds, the processing overhead required to service the network interface starts to become an issue, while networks are also becoming more complex thanks to software-defined networking (SDN).

To address these issues, hyperscale operators such as Microsoft have for several years fitted hardware accelerators – typically Field-Programmable Gate Arrays (FPGA) – to their network interface controllers (NICs) to offload some of this burden from the host server.

This concept is evolving into the fully programmable SmartNIC, with suppliers such as Mellanox even offering products with on-board SoCs that can offload a variety of tasks from the host system, including security functions as well as overlay protocols such as VXLAN used to operate virtual networks. While such hardware is largely the preserve of the hyperscalers at present, enterprises often follow where they lead.

Evolution in action: datacentre storage

Storage is also one that is changing rapidly, in response to several trends. One is the inexorable rise in the volume of data being generated and stored, while another is the demand for ever-faster storage to meet the requirements of workloads that call for low latency and high throughput.

The need for faster storage has been met in part by flash memory, which has seen the rise of hybrid storage arrays mixing hard drives with hot data held in flash storage, then the advent of all-flash arrays as the cost of flash memory has steadily decreased.

Currently, the focus is on NVMe, a storage protocol designed to cut through the accumulated layers of the existing storage stack that may add latency, so as to realise the full benefit of flash SSDs. NVMe SSDs also use the high-speed PCIe bus rather than traditional interfaces such as SAS.

This has proven so successful that it looks set to become the dominant way of connecting storage, according to Freeform Dynamics Distinguished Analyst Tony Lock.

“There’s not much doubt it will become the de facto standard for use in all storage platforms in the future,” he said.

Suppliers such as E8 and Excelero are now scrambling to extend NVMe to external storage via NVMe over Fabrics (NVMe-oF), an umbrella term for running the NVMe protocol across a network fabric such as Ethernet, Fibre Channel or InfiniBand.

This end-to-end NVMe could prove revolutionary for several reasons, one of which is that it means that storage could be located almost anywhere, even in a separate datacentre from the server accessing it, without incurring too much degradation in performance. This could be another factor in enabling the disaggregation of resources needed for composable infrastructure, for example, while also enabling access to cloud storage via the same protocol.

Read more about datacentre design trends

For the moment, however, there is a premium on such end-to-end NVMe systems, which means they are largely restricted to demanding workloads such as analytics, low-latency transaction systems and virtualised platforms where you need lots of parallel access, according to Lock.

Meanwhile, the increasing volume of data is leading some organisations to seek ways of cutting down the cost per gigabyte of storing it all. One way of doing this is through software-defined storage, using clusters of servers crammed with hard drives and running something like Red Hat’s Gluster, rather than using purpose-built storage arrays.

Another option becoming increasingly common is the use of cloud storage to offload any data from an organisation’s primary storage that is not regularly accessed. This is in addition to using the cloud for long-established purposes such as disaster recovery and backup. This trend is aided by some storage suppliers, such as NetApp, releasing versions of their enterprise storage platforms that can be deployed onto a public cloud.

However, Lock warns that organisations considering this need to be wary of the egress charges public cloud providers levy for accessing data. They also need to put in place proper data lifecycle management to govern the migration of data from primary storage to less costly media.

A longer term trend may be towards object storage. Object storage systems are designed for large volumes of data and scale easily. Some HPC sites have implemented their storage systems with a fast flash layer for immediate data processing and object storage as the back-end because it is relatively inexpensive, and this may point the way for other organisations in future. Object storage is already widely used in the cloud, with Amazon’s S3 service being a notable example.

Storage class memory

Another technology likely to have a major impact in future is storage class memory (SCM). This is actually a number of technologies that are byte addressable like DRAM, but also non-volatile like storage, and could thus change the whole way that storage operates.

Many of these technologies, such as magnetic RAM (MRAM), are still rather costly to manufacture and have thus far been used in a relatively modest fashion, such as for non-volatile caches in storage arrays, for example. The most viable SCM technology would appear to be Intel’s Optane, now available in a DIMM format that can fit into the memory socket of servers based on the newest Xeon processors.

Optane DIMMs are effectively a new tier in the memory hierarchy because they are faster than flash storage but slower than DRAM. At the same time, Optane is cheaper than DRAM, but more costly than flash storage.

The way Intel has integrated support for Optane DIMMs in the Xeon memory controller means they can be treated either as a larger pool of memory for processing large datasets, or exposed as persistent memory.

The first, Memory Mode, uses the DRAM in the system as a cache for the higher capacity Optane DIMMs (currently available with up to 512GB each), and is transparent to applications, which just see a large of memory.

The second, App Direct Mode, requires applications to be aware that there are two different types of memory in the system, and they can treat the Optane DIMM as either persistent or access it via a file system API – effectively treating it as a fast SSD.

Stronger prospects

While Optane SSDs have not sold well, the DIMMs have far stronger prospects as a relatively low-cost, non-volatile adjunct to DRAM, according to 451 senior analyst Tim Stammers.

“Oracle has already demoed its TimeTen in-memory database running in Application Direct mode, and says Optane NVDIMM persistence will be extremely useful to database suppliers, and will drive big changes in database architecture,” he said.

All of this means that while tomorrow’s datacentres are still likely to feature racks of equipment, the architecture of the equipment in those racks could be quite different from what we know today.

It is a fair bet that they will still use CPUs, GPUs, DRAM and some form of storage such as flash or even hard disks, but the ways in which these are all connected up is likely to see the biggest change.

Read more on Platform-as-a-Service (PaaS)