Key challenges in storage and virtualisation, and how to beat them

Server virtualisation brings benefits but increases storage-capacity needs. Here we survey the key challenges and what to do about them

There’s no denying server virtualisation has been a highly successful technology that has transformed the way applications are deployed in the datacentre.

What previously took weeks and months can now be achieved in hours or even minutes, reducing application development and deployment cycles.

But, one of the consequences of the move towards virtualised workloads has been a ballooning of demand for storage capacity and increased complexity in storage management for virtualisation.

So what is at the heart of this increase in storage demand? Why do so many organisations struggle to cope with storage in virtual environments? First, here's some background to the problem.

Shared storage for virtual servers

Shared storage can be deployed onto virtual servers, with either block-based systems – such as Fibre Channel and internet small computer system interface (iSCSI) – or file-based systems, including server message block (SMB) and network file systems (NFS).

In the case of block-based arrays, the storage presented to the hypervisor – or host – is in the form of a logical unit number (LUN), which on VMware vSphere platforms represents a datastore, formatted with VMware’s virtual-machine file system (VMFS).

Microsoft Hyper-V block storage is formatted as a system drive. In both implementations, many virtual machines are stored in the same datastore or drive, with datastores scaling to multiple terabytes of capacity.

For file-based systems, VMware vSphere only supports shares using the NFS protocol, while SMB is the preferred format for Hyper-V virtual machines.

Many issues in managing storage on virtual infrastructure stem from the consequences of storage design. These include the following:

Fragmentation
Virtual machines stored in a single datastore all receive the same level of performance and resiliency from the storage array. This is because the datastore is based on one or more LUNs – or volumes – on block-based systems or a file share on file-based systems. As a result, many customers create multiple datastores, each with different performance characteristics. This can include, for example, dedicated datastores for production workloads, some for development and some that have array-based replication. There are some exceptions here – systems that have block-based tiering, for example.

LUN-level array support
As already discussed, block-based systems allocate datastores based on one or more LUNs. A LUN is the smallest unit of granularity for features such as replication and failover. As a result, if customers use array-based data protection, guest virtual machines (VMs) have to be grouped on LUNs based on their data protection requirements. This can lead to the creation of additional LUNs in a system, increasing the problems of fragmentation.

Fully allocated VM storage 
Virtual machine storage can be either fully allocated (or reserved out) at VM-creation time or allocated dynamically after the VM has been created. Allocating all storage at creation potentially delivers better performance, because the hypervisor doesn’t have to go through the process of requesting and reserving storage as each block of data is written by the guest. However, in many instances, VMs are built from templates that need to be created with boot volumes of sufficient size for all installed applications. Inevitably this leads to wastage with every virtual machine deployed.

Lack of thin provisioning
thin provisioning is a feature of storage arrays and hypervisors that reserves capacity on physical disk as it is required, rather than pre-allocating it way beyond current needs. A whole article could be dedicated purely to a discussion of thin provisioning and whether best practice dictates thin provision should occur at both the hypervisor and storage array. In practice, implementing thin provisioning at both layers of the infrastructure is totally acceptable, but many organisations fail to understand that over time thin-provisioned datastores will become transformed into their thick-provisioned equivalents unless processes are in place to reclaim freed space.

VM sprawl 
Obvious as it may seem, one issue with storage growth in virtual environments is VM sprawl, or the proliferation of virtual machines that are rarely or never used. Server virtualisation allows rapid deployment of applications for production and development, but that same ease of use means without adequate management VM creation can get out of hand. This can be further compounded by the creation of orphan VMs that aren’t connected to the hypervisor inventory and exist purely on disk.

Need for tools
VM sprawl indicates one problem – the need for tools that match up the data on disk to VMs configured to the hypervisor. It is easy – especially in larger environments – for virtual machines to be removed from the inventory without deletion from disk, often for acceptable reasons. If these VMs are never cleaned up or re-associated with a host, then over time there is less likelihood of clean-up work being done.

Efficient virtualisation storage management

So how can these challenges be addressed to ensure efficient virtual-server storage?

Here are some ideas to consider:

Implement a good thin-provisioning policy
Where possible, all virtual machines should be allocated with thin-provisioned storage. However, an effective thin-provisioning strategy means more than just configuring the feature at the VM guest and storage layer.

Thin provisioning clean-up also needs to be performed at the guest and hypervisor to return released storage to the array.

This is done in the VM guest by tools such as SDelete, which writes binary zeros onto the free space of the guest file system. At the hypervisor, VMware has a feature called VAAI Thin Provisioning Block Space Reclamation (known as Unmap), which allows the administrator to release storage by using the vmkfstools command. Naturally, there is a trade-off in terms of how frequently these commands are run versus the storage reclaimed, which will vary by environment.

File-based datastores are typically more efficient at returning free space to the array. As virtual machines are deleted or files change in size, the space on the array file system is made available for use elsewhere. This is one of the advantages file-based systems have over traditional block storage.

Use native array features
Many storage arrays support zero block reclaim – also called zero page reclaim – which allows the array to natively recover zeroed-out blocks of data. HP 3Par StoreServ systems, for example, do this in line, while others perform the function as a background task that needs to be scheduled by the storage administrator.

Datastores on VMware systems can be formatted with the eager-zeroed format, which writes all zeros to the VM at creation time. The array immediately ignores the zeroed blocks, maintaining efficient space utilisation while enabling features such as fault tolerance and allowing VMs to have storage resources fully allocated at the time of creation. Hyper-V platforms have fewer issues with thin provisioning, but the same principles of running commands such as SDelete apply.

For arrays that offer sub-LUN tiering, performance can be managed at a VM level,  using EMC’s fully automated storage tiering (Fast), for example. However, using these tools may conflict with configuration settings in the hypervisor for moving data between tiers of storage. The use of this technology should be discussed with the virtual-server administrator to agree a policy that works at both the hypervisor and storage level. 

Use the latest file system formats
On VMware vSphere platforms, the VMFS disk format used to store data on block-based systems has been improved over time as new versions of vSphere have been released. However, many IT organisations may have upgraded vSphere without reformatting their VMFS datastores. This practice can lead to inefficiency compared with using the latest VMFS formats. With the availability of live migration features such as storage vMotion, migrating hosts to new VMFS formats – and taking the opportunity to thin provision them at the same time – should be a priority with each upgrade.

Manage VM sprawl
This seems like a reasonable statement to make, but isn’t obvious to everyone. It’s good practice to look at virtual-machine usage and have policies in place to archive virtual machines to cheaper storage after a period of inactivity – for example, when a VM hasn’t been powered on for three months or more. In development environments especially, it may well be a VM isn’t needed until the next software release cycle and so can be archived off until that point. Backup software such as CommVault’s Simpana suite allow virtual machines to be moved to cheaper storage while retaining a stub file in place, so the VM can still be tracked in the hypervisor inventory.

Deploy effective tools
There are tools that can identify and locate orphaned virtual machines. Alternatively, this process can be achieved using scripts to match the hypervisor inventory to a list of on-disk VM files. Running this process regularly will help catch orphaned VMs early and enable the VM to be reused or deleted.

There is one final observation to make that applies equally to storage and virtual-server administrators: spend time understanding a little more of each other’s technologies. With that in mind – and the right processes – managing storage on virtual-server installations can be made just a little easier.

Read more on Virtualisation and storage