CrowdStrike has released a post-incident review (PIR) detailing the recent failure of an update that caused 8.5 million Windows machines to crash.
The report attributes the widespread outage to a bug in test software that failed to validate the content update pushed to millions of devices last Friday. In response, CrowdStrike has committed to enhancing its testing procedures, improving error handling, and implementing staggered deployments to prevent similar incidents in the future.
The company’s Falcon software is widely used by businesses to protect against malware and security breaches. The problematic update was intended to gather telemetry on emerging threat techniques but instead resulted in system crashes. CrowdStrike typically releases updates in two formats: Sensor Content, which updates the Falcon sensor at the kernel level, and Rapid Response Content, which adjusts the sensor’s behavior for malware detection. The issue stemmed from a small 40KB Rapid Response Content file.
CrowdStrike’s cloud system usually performs validation checks on updates before release. However, a bug in the Content Validator allowed a problematic update to pass through, leading to the crashes. While CrowdStrike conducts both automated and manual testing on its Sensor Content, it appears that the Rapid Response Content received less rigorous testing.
To address this, CrowdStrike plans to enhance its testing protocols for Rapid Response Content, including local developer testing, rollback testing, and stress testing. The company will also update its cloud-based Content Validator to better detect problematic updates and improve error handling within the Falcon sensor’s Content Interpreter. Additionally, CrowdStrike will adopt a staggered deployment approach for future updates, gradually rolling them out to a larger user base to minimize risk.
This incident highlights the importance of thorough testing and validation in software updates, especially given the potential for widespread disruption in critical services.
CrowdStrike Preliminary Post Incident Review (PIR) is released: https://t.co/3ZVBxIfUNq pic.twitter.com/pDEU5t50a7
Advertisement— John Hammond (@_JohnHammond) July 24, 2024