CrowdStrike reveals Windows weakness
The bug that impacted 8.5m PCs last week was, says, CrowdStrike due to an update it refers to as “Rapid Response Content”, which is delivered as a “template instance”. This is data stored in a proprietary binary file that configures specific functions in the CrowdStrike Falcon sensor used in Windows.
CrowdStrike says that a content interpreter on the sensor reads the Rapid Response Content, which then enables the Falcon sensor to observe, detect or prevent malicious activity, depending on the customer’s policy configuration.
There are numerous reports, including one from a former Microsoft operating system developer, David William Plummer, that point to the fact that the only way an update such as the one issued on July 19th could cause a catastrophic failure, is if it runs as a device driver in the core Windows kernel.
This means it runs in “kernel mode” or “Ring Zero”. The implication is that it has full control of the PC. The majority of software that are not device drivers, run in “user mode” or “Ring One”, where a fault only causes the code to crash – not the whole operating system.
CrowdStrike says the Rapid Response Content is not code or a kernel driver. It can’t be, as anything that runs at Ring Zero – as Plummer explains – needs to pass Windows Hardware Quality Lab certification (WHQL).
The wording in CrowdStrike’s own response to the catastrophic Windows crash, points to a bug in its Falcon sensor and another bug in the Rapid Response Content it issued on July 19th: “Due to a bug in the content validator, one of the two Template Instances passed validation despite containing problematic content data,” it said.
By looking at crash dumps posted on X (formerly Twitter), Plummer notes that a “null pointer reference” caused an empty file – the Rapid Response Content – containing zeros to be uploaded by the CrowdStrike device driver.
“We don’t know how or why this happened, but what we know is that the CrowdStrike driver that handles and processes these updates is not very resilient and appears to have inadequate error-checking and parameter validation,” he says.
The 8.5m PCs that crashed were a result of failures in process both at CrowdStrike and Microsoft.
First, why wasn’t the bug in the content validator not picked up by the WHQL certification process?
Second, we need to question CrowdStrike’s quality assurance and test process. While the device driver bug should have been flagged by failing WHQL validation, how did CrowdStrike manage to introduce errors in one of its Rapid Response Content update files?
Microsoft called on the industry to prioritise safe deployments. But we need far more than this to stop such failures ever happening again.