On July 19th, a single error within the files of a popular cybersecurity program resulted in millions of Windows devices crashing and unable to operate properly. This incident was widespread, affecting airlines, banks, hospitals, small businesses, and many others, making it difficult for normal business operations to continue. This all occurred when CrowdStrike, a well-known cybersecurity company, “released a sensor configuration update to Windows systems. Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform” (CrowdStrike). Unfortunately, this update resulted in catastrophe. It is important to note that these type of updates “are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike” (CrowdStrike). This highlights how important it is to rigorously test any sort of program or update before deployment, even if it is a routine one.

What Was the Issue?

CrowdStrike identified the issue as “a defect in a recent content update for Windows hosts. Mac and Linux hosts were not impacted. The issue has been identified and isolated, and a fix has been deployed”, which has allowed some businesses to begin regaining the usage of their computer systems (CrowdStrike). Specifically, this defect “triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems” (CrowdStrike). There are two general types of programming errors: syntax errors and logic errors. While syntax errors are those in which code is not formatted properly (for example, typing “Systum.oot.prnt(“Hello World!”); in Java is a syntax error since multiple items were spelled incorrectly), a logic error is one in which the code is formatted properly, but there is an issue with what the code is meant to be doing. For example, in Java, one cannot assign a letter to an integer variable as a letter is considered a character (such as the letter A) while integers are strictly a type of number (such as the number 86). If one were to assign the letter A to an integer variable, the program would encounter a logic error.

CrowdStrike provided the following, more specific details regarding the software defect.

  • CrowdStrike has identified the trigger for this issue as a Windows sensor related content deployment and we have reverted those changes. The content is a channel file located in the %WINDIR%\System32\drivers\CrowdStrike directory.
    • Channel file “C-00000291*.sys” with timestamp of 2024-07-19 0527 UTC or later is the reverted (good) version.
    • Channel file “C-00000291*.sys” with timestamp of 2024-07-19 0409 UTC is the problematic version.
    • Note: It is normal for multiple “C-00000291*.sys files to be present in the CrowdStrike directory – as long as one of the files in the folder has a timestamp of 05:27 UTC or later, that will be the active content.
CrowdStrike

In terms of who would be impacted by this defect, CrowdStrike stated the following:

Customers running Falcon sensor for Windows version 7.11 and above, that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC, may be impacted. 

Systems running Falcon sensor for Windows 7.11 and above that downloaded the updated configuration from 04:09 UTC to 05:27 UTC – were susceptible to a system crash.

CrowdStrike

This error would result in users “experiencing a bugcheck\blue screen error related to the Falcon Sensor” (CrowdStrike). Essentially, the computer itself would crash as a result of this error, producing what is often referred to as “the blue screen of death”. This is the screen that appears whenever something goes wrong, causing the computer to suddenly shutdown. In some instances, a blue screen error is a one-time event, and the computer can be booted up normally afterwards. In other circumstances, however, the computer will continually produce a blue screen error and shutdown, preventing any meaningful use of the device. Unfortunately, for many users, they were stuck with a repeating blue screen error as a result of this software defect.

Who was Impacted?

While it can be difficult to gauge exactly how many devices were impacted by this issue, Microsoft stated the following regarding the incident:

We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services. 

This incident demonstrates the interconnected nature of our broad ecosystem — global cloud providers, software platforms, security vendors and other software vendors, and customers. It’s also a reminder of how important it is for all of us across the tech ecosystem to prioritize operating with safe deployment and disaster recovery using the mechanisms that exist. As we’ve seen over the last two days, we learn, recover and move forward most effectively when we collaborate and work together. We appreciate the cooperation and collaboration of our entire sector, and we will continue to update with learnings and next steps. 

Weston

One particular sector of the CrowdStrike issue that has been heavily impacted was the travel sector. Delta Airlines CEO Ed Bastian commented that, “like many companies worldwide, Delta was impacted on Friday morning by an outside vendor technology issue, which prompted us to pause flying while our systems were offline”. Unfortunately, “the technology issue occurred on the busiest travel weekend of the summer, with our booked loads exceeding 90%, limiting our reaccommodation capabilities”, leaving many individuals unable to book a new flight to their intended destination. Bastian further went on to state, “I want to apologize to every one of you who have been impacted by these events. Delta is in the business of connecting the world, and we understand how difficult it can be when your travels are disrupted” (Bastian).

Unfortunately, this incident went far beyond simply affecting airlines; many businesses, both large and small, have been impacted by this bug. The effects of this error were clearly felt far and wide and serve as a reminder of the importance of software integrity and testing.

CrowdStrike’s Response

CrowdStrike commented that they “understand the gravity of this situation and are deeply sorry for the inconvenience and disruption. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers” (CrowdStrike). Unfortunately for CrowdStrike, the damage to their reputation has already been done, as reflected by the drop in stock value for the company as well as much of the online sentiment lambasting the company.

In addition, it appears that Microsoft is also receiving negative attention for this incident since this issue only affected Windows systems. However, it is important to note that “this was not a Microsoft incident” and they are not responsible for this issue (Weston). However, since it does affect devices that are part of the Microsoft ecosystem, Microsoft has been working diligently to aid in supporting any affected customers or devices (Weston).

Cybersecurity Concern

CrowdStrike CEO George Kurtz said the following in his statement regarding the incident: “We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives. Our blog and technical support will continue to be the official channels for the latest updates” (CrowdStrike). Unfortunately, according to the Cybersecurity & Infrastructure Security Agency (CISA), it appears that threat actors have already quickly begun taking advantage of this error. Specifically, the CISA “has observed threat actors taking advantage of this incident for phishing and other malicious activity. CISA urges organizations and individuals to remain vigilant and only follow instructions from legitimate sources. CISA recommends organizations to remind their employees to avoid clicking on phishing emails or suspicious links” (CISA). Furthermore, CISA has also identified that some “threat actors have been distributing a malicious ZIP archive file. This activity appears to be targeting Latin America-based CrowdStrike customers” (CISA). It is important to remain extremely vigilant as this issue continues to unfold.

Fixing the Issue & Preventing It from Happening Again

Fortunately, there have been many efforts towards fixing this software issue. For one, CrowdStrike has published the following video which gives a step-by-step walkthrough for how to fix a device affected by this error:

In addition, CrowdStrike advises its customers to “check the support portal for updates” as they continue to work on fixing this error and mitigating its impact (CrowdStrike).

The most recent update regarding this issue was as follows:

Using a week-over-week comparison, ~99% of Windows sensors are online as of July 29 at 5pm PT, compared to before the content update. We typically see a variance of ~1% week-over-week in sensor connections.

CrowdStrike

Furthermore, CrowdStrike has utilized its own software to help fix the issue and remove the affected file:

To prevent Windows systems from further disruption, the impacted version of channel file 291 was added to Falcon’s known-bad list in the CrowdStrike Cloud. When a Windows system with Falcon installed contacts the CrowdStrike Cloud, a request to remove the bad channel file and place it in quarantine, which is visible in your Falcon UI, will be issued. If the file does not exist, no quarantine will occur and systems will continue to operate normally.

Adding the impacted version of channel file 291 to Falcon’s known-bad list prevents inadvertent reuse by operational or recovered systems. With strong network connectivity, this action could also result in the automatic recovery of systems in a boot loop.

This was configured in US-1, US-2, and EU on July 23, 2024.

Gov-1 and Gov-2 customers can request a channel file 291 known-bad classification by contacting CrowdStrike Support.

No sensor updates, channel files, or code was deployed from the CrowdStrike Cloud.

CrowdStrike

While this incident was catastrophic for many, there are extremely important lessons to be learned going forward, both for CrowdStrike as well as for other companies or businesses with large customer bases. However, these lessons can all be summed up with the following piece of advice: always rigorously test your software before deployment.

CrowdStrike indicated that the following changes will occur to prevent further incidents from occurring; these changes also serve as a great list of measures to implement for any software industry.

Software Resiliency and Testing

  • Improve Rapid Response Content testing by using testing types such as:
    • Local developer testing
    • Content update and rollback testing
    • Stress testing, fuzzing and fault injection
    • Stability testing
    • Content interface testing
  • Add additional validation checks to the Content Validator for Rapid Response Content. A new check is in process to guard against this type of problematic content from being deployed in the future.
  • Enhance existing error handling in the Content Interpreter.

Rapid Response Content Deployment

  • Implement a staggered deployment strategy for Rapid Response Content in which updates are gradually deployed to larger portions of the sensor base, starting with a canary deployment.
  • Improve monitoring for both sensor and system performance, collecting feedback during Rapid Response Content deployment to guide a phased rollout.
  • Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed.
  • Provide content update details via release notes, which customers can subscribe to.
CrowdStrike

Resources & Further Reading

Bastian, Ed. “An Update to Delta Customers from CEO Ed Bastian | Delta News Hub.” News.delta.com, Delta Airlines, 21 July 2024, news.delta.com/update-delta-customers-ceo-ed-bastian.

CrowdStrike. “Falcon Content Update Remediation and Guidance Hub | CrowdStrike.” Crowdstrike.com, CrowdStrike, 21 July 2024, www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/.

—. “Technical Details: Falcon Update for Windows Hosts | CrowdStrike.” Crowdstrike.com, CrowdStrike, 20 July 2024, www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/.

Cybersecurity & Infrastructure Security Agency (CISA). “Widespread IT Outage due to CrowdStrike Update | CISA.” Www.cisa.gov, U.S. Department of Homeland Security, 19 July 2024, www.cisa.gov/news-events/alerts/2024/07/19/widespread-it-outage-due-crowdstrike-update.

Weston, David. “Helping Our Customers through the CrowdStrike Outage.” The Official Microsoft Blog, Microsoft, 20 July 2024, blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to receive email notifications:

Latest Articles