freshidea - stock.adobe.com
CrowdStrike says most Falcon sensors now up and running
The vast majority of CrowdStrike Falcon sensors affected by a coding error have now been recovered, with a final resolution expected this week
The majority of CrowdStrike Falcon sensors affected by a botched rapid response update were back up and running prior to the weekend of 27 and 28 July, as efforts to remediate the 19 July incident that caused more than eight million Windows machines to crash continue.
Writing on LinkedIn on 26 July, CrowdStrike CEO George Kurtz, who has been communicating information about the incident at a steady clip since it first unfolded, said that as of Thursday 25 July “over 97%” of Windows sensors were back online.
“This progress is thanks to the tireless efforts of our customers, partners, and the dedication of our team at CrowdStrike. However, we understand our work is not yet complete, and we remain committed to restoring every impacted system,” said Kurtz.
“To our customers still affected, please know we will not rest until we achieve full recovery. At CrowdStrike, our mission is to earn your trust by safeguarding your operations. I am deeply sorry for the disruption this outage has caused and personally apologise to everyone impacted. While I can’t promise perfection, I can promise a response that is focused, effective, and with a sense of urgency.”
Kurtz said the remedial efforts had been greatly helped thanks to the use of automated recovery techniques and by mobilising all possible resources to support affected customers. He reiterated CrowdStrike’s commitment to its core mission – to stop breaches – but with a new focus on customer controls and resilience, as detailed in the firm’s preliminary incident report last week.
Fixed update set for implementation soon
Meanwhile, CrowdStrike confirmed to Computer Weekly’s sister title TechTarget Security prior to the weekend that the logic error in its validator tool that caused the chaos was definitely fixed, and intensive testing is now underway before the update can be pushed to live on its backend systems, set for the coming days.
The tainted update was part of a rapid response roll-out normally used by CrowdStrike to enhance the dynamic protection mechanisms of its Falcon platform – that is to say, it was designed to identify new cyber security issues and help customers mitigate them.
The company performs such updates all the time, but on this occasion, some problematic content in a channel file made it past the beady eyes of CrowdStrike’s automated content validator. The two issues combined led to an out-of-bound memory condition, which triggered an exception overwhelming the Windows operating system and causing vulnerable devices to fail and crash, resulting in the infamous blue screen of death.
CrowdStrike is attempting to make sure the issue cannot replicate in future by improving the resilience of its rapid response updates through improved testing at multiple levels, and adding refreshed validation checks to the automated content validator tool that let it down.
It also now plans to roll out rapid response updates on a staggered basis, deploying them across the Falcon sensor base more slowly and making use of “canary” deployments designed to highlight any major issues before they spread.
This will see sensor and system performance receive enhanced monitoring, and at some point, CrowdStrike customers are to be given more options to manage rapid response updates themselves.
Real-life impacts
Meanwhile, real-world impacts continue to be felt from the outage, which notably caused airlines all over the world to delay, reschedule and cancel flights.
Among the stories to have emerged is that of an 83-year-old man who became the subject of a search operation by authorities in the US. Patrick Bailey, who was scheduled to fly home from Florida to California on 19 July, was put up in a local hotel when his flight was cancelled.
Although Bailey checked out the following morning, he accidentally left his mobile phone in his room and went missing for several days. Bailey eventually turned up in California on 28 July, having instead decided to take a long-distance Greyhound bus across the US.
Computer Weekly and TechTarget coverage of the CrowdStrike incident
- 19 July 2024: An update to CrowdStrike’s Falcon service has led to many Windows users being unable to work this morning. Microsoft 365 is also affected.
- The Emis Web IT system used by more than half of GP practices in the UK is down, following the worldwide Microsoft outage.
- The global outage of Microsoft is rapidly sending shockwaves across all sectors, demonstrating the risk of having a single point of failure.
- A CrowdStrike update with a faulty sensor file has global implications for Windows systems. But competitors need to limit the finger-pointing in case it happens to them.
- As organisations recover from today’s outages, the cyber security industry will need to develop new security software evaluation criteria and requirements and learn to parlay risks.
- 22 July: About 8.5 million devices globally were hit by the botched CrowdStrike update, with a significant number now back online and operational.
- The concentration of so much mission-critical technology in the hands of a few large suppliers makes incidents like the Microsoft-CrowdStrike outage all the more dangerous.
- Financially motivated cyber criminals are already conducting opportunistic attacks on organisations that leverage the CrowdStrike incident, and more targeted attacks are sure to follow.
- 23 July: The ‘blue screen of death’ signals a catastrophic Windows failure, which is exactly what many people faced on 19 July 2024 – but why did it happen? One former Microsoft engineer has a theory.
- Disaster recovery has centered on cyberattacks the past few years, but the CrowdStrike outage illustrates why companies can't forget about traditional business continuity.
- 24 July: Enterprises that emerged unscathed from the roll-out of the botched CrowdStrike software update are being urged to view it as a wake-up call rather than a lucky escape.
- The largest global organisations hit by the CrowdStrike - Microsoft incident on 19 July will likely be out of pocket to the tune of billions of dollars.
- CrowdStrike publishes the preliminary findings of what will be a lengthy investigation into the root causes of the failed 19 July update that caused Windows computers to crash all over the world.
- 25 July: Microsoft has pointed the finger at EU regulators, blaming them for a ruling that means it needs to offer third parties like CrowdStrike access to the core Windows OS.