Troubleshooting datacentre management issues

Gaining better control of the datacentre has to be a high priority to ensure high levels of performance and availability

A datacentre is a complex environment, and despite the best efforts of vendors, many organisations and datacentre facility owners still seem to be struggling with management and control issues.

With the dependency that organisations now have on their IT platform, gaining better control of the datacentre has to be a high priority to ensure high levels of performance and availability.

The main problem often lies in the fact that the datacentre facility itself is essentially owned and managed by the facilities team, whereas the IT equipment is owned and managed by the IT team.

The building, its electricity feeds and distribution systems, the cooling, the auxiliary generators and the majority of the environmental monitoring and response systems will generally be managed via a building information modelling (BIM) system; the IT equipment through an IT systems management tool set.

In the past, these have not been integrated.  In most cases, it wasn’t that much of an issue – the building’s systems were over-engineered to meet the possible needs of IT over an extended period of, say seven to 10 years.  However, the speed of change in the IT world has led to a need to bring BIM and IT systems management together.

More articles on datacentre management

For example, increasing IT equipment densities can stress the building’s systems in different ways. The power density per rack can exceed the capabilities of the supporting power distribution systems; the heat signature of the rack could be too much for the cooling systems to bring hot spots back into acceptable levels. Without linkages between the various management systems, it becomes a guessing game as to where problems really lie. This is unacceptable in a world where IT underpins much of an organisation’s business needs.

Trouble ahead

The problems are only getting worse as new IT architectures appear. Virtualisation and cloud are removing direct dependencies of workloads on underlying equipment – an application can be running across a mix of shared resources. However, the failure of a single physical item of equipment can still have some impact on the workloads that were using its resources, and many IT tool sets still struggle to deal simultaneously with both the virtual and physical worlds.

Add to this the increasing presence of shadow IT, not only in the use of cheap, software as a service (SaaS) based application, but also in the explosion of bring your own device (BYOD) mobility, and the oft-seen lack of capability for anyone to be able to uncover the root cause of any issue is not surprising.  For those with responsibility for the datacentre, a knowledge that a problem is outside of their area is just as vital as knowing that it is within the facility: at least focus can be applied to the right area.

There is a real need to pull all the various aspects together of how the overall IT platform works so that a single view of the world is available.

This brings us to datacentre infrastructure management (DCIM). An approach that essentially started in the BIM world, DCIM vendors such as Schneider, Emerson and Nlyte have been rapidly extending their system's capabilities over the last few years, and their tools can now act as a bridge between the facilities and IT worlds. 

Although DCIM’s capabilities are improving, it is not as yet an answer on its own. DCIM provides a lot of capabilities for designing, monitoring and operating the datacentre facility, but in most cases still leaves the job of running the IT stack to others. DCIM provides predictive capabilities so problems can be avoided through, for example, adding one too many systems to a rack that would overload the power-distribution systems or cause hotspot issues. It therefore enables problems to be avoided, but also moves some of the way towards providing a better approach to root cause analysis of actual problems as they arise.

What are the options?

While the DCIM/IT systems management worlds are morphing to be more integrated, you need to ensure any systems you are looking at for future use will be capable of embracing all aspects of your organisation’s needs

But, in itself, this is not enough. The DCIM vendors have a couple of options open to them. They can get in bed with one or more of the systems management vendors (such as IBM, Dell, BMC, Microsoft or HP), or they can try to move into systems management themselves. The one that to date is the closest to being a full-service play is CA: as an existing systems management vendor, it has acquired other companies and merged capabilities to build a DCIM/systems management tool set that shows great promise, but still has work to be done on the more facilities side of its capabilities. Nlyte has just announced an agreement with HP around tying in to HP’s change management data base (CMDB) capabilities to extend its reach. This could lead to greater synergies between Nlyte’s and HP’s portfolios across the space.

Only by gaining a single view on the end-to-end dependencies of how a business process is being supported by the technology and physical assets that underpin it is true troubleshooting possible. This needs integrated systems that include everything from the applications down to the datacentre facility, and then beyond into public cloud and the network connections involved.

Siloed management systems must be avoided. IT is so critical to the financial performance of any organisation, putting anything in place that allows problems to fall between two stools or allows different groups to finger-point at each other is only going to have negative business impact.

While the DCIM/IT systems management worlds are morphing to be more integrated, you need to ensure any systems you are looking at for future use will be capable of embracing all aspects of your organisation’s needs, enabling a single view. Even where tactical decisions are required to provide urgent functionality to the business, try to ensure that the choices made take into account a longer-term strategy; avoid the need for fork-lift replacements and major changes in approach wherever possible.

Download Quocirca’s free When Data Centre Layers Converge report on how DCIM is changing to meet the needs of a world where virtual workloads and physical systems need to be managed as a coherent single system.

Clive Longobottom is founder of analyst company Quocirca.

Read more on IT operations management and IT support