CrowdStrike Goes on Strike

99 Problems and C-00000291*.sys is One

The Lightwave

Fail Oh No GIF by Sabato

Gif by sabatobox on Giphy

Practical Insights for Skeptics & Users Alike…in (Roughly) Two Minutes or Less

“This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed”

George Kurtz, CEO and Headache Haver at CrowdStrike

This morning: a slight detour away from our usual programming to talk about the CrowdStrike incident from this past weekend (July 19, 2024).

The Whoops Felt Around the World

It goes something like this:

CrowdStrike, known for its robust Falcon cybersecurity platform, pushed out a routine software update, as they have so many times before.

And that should have been that.

However, this update contained a critical bug that caused Windows computers running the Falcon agent to crash spectacularly, displaying the dreaded "blue screen of death" (BSOD).

The impact was far-reaching:

  • Banks struggled to process transactions

  • Hospitals faced disruptions in patient care systems

  • Airlines grounded flights, leaving passengers stranded

  • TV broadcasters scrambled to maintain their streams

  • Retailers grappled with frozen point-of-sale systems

The specific culprit was a content updated for Windows hosts that was apparently named "C-00000291*.sys". (Mac & Linux users were spared—a fact which I am sure will be touted in weird tech-focused Reddit arguments and conspiracy theories).

How Could This Happen?

Justin Cappos, a professor at NYU with expertise in computer science and cybersecurity, said:

“It strains credibility that any organization, much less a security company, would fail to have robust software supply chain validation mechanisms in place. The fact that software could be released without testing demonstrates a level of negligence, I’m shocked to see from a major security company.”

To their credit, CrowdStrike acted swiftly (Well, what choice did they have?). They identified and isolated the issue; a fix was deployed with the problematic changes were reverted; and affected customers were provided with workaround steps (rebooting in Safe Mode and deleting the offending files).

The CrowdStrike outage highlights the necessity for rigorous testing protocols before deploying updates.

Many have been quick to point out that the faulty update that led to the BLUE SCREEN OF DEATH on Windows systems could have been avoided with more comprehensive testing, e.g:

  • Automated Testing to quickly identify potential issues in new code.

  • Regression Testing to ensure that new updates don’t screw up existing functionalities.

  • Stress Testing to evaluate how updates perform under extreme conditions to ensure stability.

The CrowdStrike breakdown reminds us of the interconnected nature of modern IT systems and the potential for widespread disruption from a single point of failure.

And this is the heart of the issue: The complexity of modern IT systems (which are only getting more complex, even as they become “simpler”) means that no single technology can guarantee the prevention of all possible issues.

In tech, sometimes you’re the Achilles, and sometimes your his heel…