motortion - stock.adobe.com
CrowdStrike blames outage on content configuration update
CrowdStrike publishes the preliminary findings of what will be a lengthy investigation into the root causes of the failed 19 July update that caused Windows computers to crash all over the world
Under-fire cyber firm CrowdStrike has published an initial post incident review setting out more information on the update-gone-wrong that brought down millions of Microsoft devices on 19 July, causing global chaos.
In an update posted on 24 July, the firm said it had attempted to release a content configuration update for its Falcon sensor on Windows hosts early in the morning of Friday 19 July.
This “rapid response” update formed part of the normal dynamic protection mechanisms used by the Falcon platform to conduct cyber threat detection and remediation activity. Essentially, the updates are used by CrowdStrike to identify new indicators of threat actor behaviour and to improve its detection and prevention capabilities.
Such cloud-delivered updates would normally pass without drawing any attention to themselves. However, this update caused Windows hosts running Falcon sensor 7.11 and upwards that were online at the time to crash.
The issue in play in fact dates back to February 2024, when Falcon sensor version 7.11 dropped containing templates to detect a new attack technique abusing named pipes – a client-server communication conduit. These templates were later stress tested and validated for use before being released to production. Three more template instances were deployed over the following weeks, again without incident.
Fast forward to 19 July, when two additional template instances for the same attack technique were lined up to be deployed. However, on this occasion, said CrowdStrike, a bug in an automated content validator used to check updates enabled one of them to pass validation checks “despite containing problematic content data”.
It was deployed based on the testing performed back in March, but when received and loaded, this problematic content in channel file 291 resulted in an out-of-bound memory condition, triggering an exception that overwhelmed Windows operating systems.
CrowdStrike update chaos explained: What you need to know
A botched software update at cyber security firm CrowdStrike has caused IT chaos around the world. Learn more about the global CrowdStrike update outage as it develops with our expert guide. Meanwhile, TechTarget Security's Risk & Repeat podcast discussed the fall-out from the outage.
The bugged update was live for just over an hour and a quarter before CrowdStrike reverted it, from 04:09 UTC to 05:27 UTC (5:09 BST to 06:27 BST) on Friday, but this was sufficient time to cause more than eight million devices worldwide to crash and display the infamous Blue Screen of Death, photos of which spread around the world.
CrowdStrike CEO George Kurtz again apologised to customers and others affected, including the many thousands of people who experienced delayed and cancelled flights.
“All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority,” said Kurtz.
Kurtz also reiterated that neither itself nor Microsoft had fallen victim to any kind of cyber attack, and reaffirmed that Linux and Mac hosts were not affected.
“CrowdStrike is operating normally, and this issue does not affect our Falcon platform systems. There is no impact to any protection if the Falcon sensor is installed. Falcon Complete and Falcon OverWatch services are not disrupted,” he said.
“We have mobilised all of CrowdStrike to help you and your teams. If you have questions or need additional support, please reach out to your CrowdStrike representative or Technical Support.
“We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives. Our blog and technical support will continue to be the official channels for the latest updates.
Kurtz added: “Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike. As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”
Who is George Kurtz?
CrowdStrike’s CEO, who has now been summoned to account for the incident before the United States Congress, has history with botched updates. In early 2010, while working as chief technology officer (CTO) at antivirus firm McAfee, the firm pushed a software update that deleted a number of important Windows XP system files, causing boot loops and blue screens as the unfortunate systems crashed.
New Jersey-born Kurtz, who got his start in tech programming video games on a Commodore system, left McAfee in 2011 after becoming frustrated with the firm’s technology. He teamed up with McAfee colleague Dmitri Alperovitch and another former colleague to set up CrowdStrike, which launched with the idea of shifting security away from antivirus to getting out in front of threat actors.
CrowdStrike’s cloud-first model proved hugely successful, and the company subsequently acquired a solid reputation for threat intel and investigative work, notably playing a key role in the probe into the hack of the Democratic National Committee in 2016.
Away from computers, Kurtz is also a car fanatic and a racing driver of many years' standing. He currently races in the IMSA WeatherTech SportsCar Championship in the US and Canada, and has also competed at the 24 Hours of Le Mans.
What happens next?
CrowdStrike has now set out an extensive preliminary plan designed to keep such an incident from occurring again.
This includes improving the resiliency of rapid response updates by performing more developer testing, update and rollback testing, stress testing, fuzzing and fault injection, stability testing, and content interface testing. More validation checks are to be added to its content validator system, and other components of its setup are to have their existing error handling enhanced.
Future rapid response deployments will also now be done on a staggered basis, gradually deployed to larger portions of the Falcon sensor base, starting with a so-called “canary” deployment. As part of this, sensor and system performance will be put under enhanced monitoring, while customers will be given greater control over the delivery of such updates, which will also now come with release notes.
Timeline of the CrowdStrike incident
- 19 July 2024: An update to CrowdStrike’s Falcon service has led to many Windows users being unable to work this morning. Microsoft 365 is also affected.
- The Emis Web IT system used by more than half of GP practices in the UK is down, following the worldwide Microsoft outage.
- The global outage of Microsoft is rapidly sending shockwaves across all sectors, demonstrating the risk of having a single point of failure.
- A CrowdStrike update with a faulty sensor file has global implications for Windows systems. But competitors need to limit the finger-pointing in case it happens to them.
- As organisations recover from today’s outages, the cyber security industry will need to develop new security software evaluation criteria and requirements and learn to parlay risks.
- 22 July: About 8.5 million devices globally were hit by the botched CrowdStrike update, with a significant number now back online and operational.
- The concentration of so much mission-critical technology in the hands of a few large suppliers makes incidents like the Microsoft-CrowdStrike outage all the more dangerous.
- Financially motivated cyber criminals are already conducting opportunistic attacks on organisations that leverage the CrowdStrike incident, and more targeted attacks are sure to follow.
- 23 July: The ‘blue screen of death’ signals a catastrophic Windows failure, which is exactly what many people faced on 19 July 2024 – but why did it happen? One former Microsoft engineer has a theory.
- Disaster recovery has centered on cyberattacks the past few years, but the CrowdStrike outage illustrates why companies can't forget about traditional business continuity.
- 24 July: Enterprises that emerged unscathed from the roll-out of the botched CrowdStrike software update are being urged to view it as a wake-up call rather than a lucky escape.
- The largest global organisations hit by the CrowdStrike – Microsoft incident on 19 July will likely be out of pocket to the tune of billions of dollars.