pressmaster - stock.adobe.com

Infrastructure patching at scale: What you need to know

Embarking on an enterprise-wide patching exercise can be a daunting process for IT departments, but there are few things that can be done to make it less fraught and ensure effectiveness

This article can also be found in the Premium Editorial Download: Computer Weekly: Take the pain out of software patching

Rolling out software patches across enterprise-scale environments is a much more complex undertaking than attempting it in smaller sites, and the process needs to be managed with precision and efficiency to ensure a good outcome.

In particular, there are six or so key areas that IT administrators need to take special care and account for when embarking on a large-scale patching project – the first being the overall complexity of the IT environment involved.

In large environments, the sheer number of interactions that take place between different systems and hosts means caution must be applied when patching as a single break in the chain could have dire consequences.

As an example of a worst case scenario, breaking one thing could have a detrimental impact in other areas of the IT estate, creating huge amounts of disruption that could potentially (and directly) impact the company’s bottom line.

Planning for this complexity should be done well in advance. There should be a detailed, well-documented and peer-reviewed process in place for patching key systems, including end-to-end testing to ensure systems are not negatively impacted.

This document should be considered a living document and continually revised and updated to reflect the changing environment.

Watch out for side effects

When carrying out work of this nature, it is imperative to keep tabs on how the environment is responding, so any unwanted side effects can be nipped in the bud as soon as they arise.

An example of this is in a CPU bug mitigation exercise. While the data outputs from the program may be the same, the amount of time taken to achieve the same results can increase due to the mitigation code. Depending on the configuration, this may not really impact the infrastructure, but the business needs to decide on a risk verses reward scenario, in that instance.

There is also a huge difference between patches to fix known issues, bugs and updates that provide new or modifications to existing functionality. The latter are the riskiest and should be extensively reviewed and tested before being implemented.

People and process

It is also worth remembering that not every patch needs to be applied. That said, not deploying the patches can have catastrophic results, like those experienced in the Equifax hack. It all comes down to managing risk.

In large environments there should be a group of people that have responsibility for planning and executing the upgrades.

Just as critically, the businesses involved need to be included in the process. For a small IT department, patching can be done at appropriate times out of hours, but where 24/7-type operations are concerned, the patching group must look at the impact to services, rather than individual servers. 

A good IT department will have a configuration management database in place that should hold details of all these services, hardware, configurations and the dependencies that exist between them.

Planning for groups of updates to service components makes life easier. At the same time, every service should have an agreed maintenance window. This is the time set aside to do just this type of work and it should not be included in the Service Level Agreement (SLA), although some organisations still do.

The benefits of phased deployments

When applications are grouped together, it is possible to decide in an intelligent manner how to approach the patching.

At the risk of perhaps stating the obvious, administrators should test in development, in the user acceptance training (UAT) phase and finally in production.

That way it can help avoid discovering those nasty issues that suddenly appear and have to be triaged and remediated. The sooner the issues are noted on less critical systems, the sooner it helps system stability and the administrators’ blood pressure.

In any large-scale environment there will be one or two servers that will need manual intervention, just due to sheer number. Make sure these are logged and remediated as quickly as practically possible.

Read more about infrastructure management

At scale, manual patching is not advisable. Not only is it extremely labour intensive, it increases the probability of manual errors. There are many automated patching tools available and there is no one best tool for every environment.

A properly configured update tool will pay back its cost very quickly. A good tool will allow groupings of hosts and allow the ordering of them so the patching can be done in an ordered and consistent manner. Effectively, batching together identical systems reduces the work involved.

Automation and standardisation are great tools in speedy deployments. Using a phased, well-planned approach – incorporating development, quality assurance, UAT, production and well-planned approach helps reduce the risks associated with patching.

The importance of proper planning

Well-planned system updates do not patch all the systems at the same time, and it must be phased it is in such a way that any issue that arises does not take down all the infrastructure. Ideally, taking an A/B testing phased approach.

A well-designed infrastructure will be split into at least two identical groups. Applying the patches to one group helps negate any patch issues. Once the patches have been installed and verified to work correctly, the other set of servers can be done.

It is imperative that one side is fully tested before committing to the doing the next round of patching. Not doing this can have devastating consequences that can majorly impact a system.

Had it been properly phased and tested, there wouldn’t have been such an issue, because half of the resources would have still been available. Downtime can cost serious amounts of money.

In summary, patching is not horrifically difficult if planned and executed consistently, modified and updated to perfect the process. Standardisation in build environments is key, as well as using automated patching where possible.

Read more on Datacentre performance troubleshooting, monitoring and optimisation