I thought I’d give you some vague thoughts on this past Friday’s Crowdstrike (CS) debacle and the topic of security products in general.
To say that CS screwed up this past week is putting it mildly. An estimated 9M endpoints were affected by CS’ borked update that was delivered to its Falcon product early Friday morning, and many of these in critical applications. The details (with publicly available information) have been discussed to death so I’ll not deal with those here.
For those interested in CS’ RCA (released this morning, Wed 24th July), here is a link to TheHackerNews article on this issue.
Of interest is the fact that at no point does CS indicate why the issue was not flagged, and how it got through their QA process.
The issue then generally shows, at minimum, a lack of QA/process control in CS’ software system, and at worst, plain willful negligence. Their inclusion in critical networks paints a very worrying picture.
Wikipedia describes “due diligence” as:
The term “due diligence” can be read as “required carefulness” or “reasonable care” in general usage, and has been used in the literal sense of “requisite effort” since at least the mid-fifteenth century.
Attack?
We only have CS’ word that this was an update issue … this may sound conspiratorial, but there’s always the possibility that the CS supply chain was compromised to specifically deliver the borked update.
Attackers have always been very inventive and this wouldn’t be the first time that a supply chain was compromised (think Solarwinds, xz, numerous app repos, etc.). In fact, supply chain attacks are a dime a dozen, and a regular target for compromise with long-term spy/sleeper-style planning a part of it.
Attackers also now know how “easy” it is to DoS endpoints – just compromise an application with kernel-level access. Although, if you attain kernel-level access, you;ve got free reign in any case.
Kernel be damned!
It has to be noted as well that many security vendors in this specific product space (EDR/XDR/etc.) use the kernel-driver (or similar) method to perform their security functions. So CS is not alone in their potential to cause these kinds of issues.
The fact that this hasn’t happened to other vendors (that we know of) is perhaps just plain luck … or they have (better) processes in place to prevent this kind of thing.
There is a fine balancing act with this product type – the ability to prevent full-stack 0-day breaches down to the kernel level, against the potential for borking a system due to X issue or being compromised and allowing full-system attacker access. As it’s impossible to guarantee a risk free mechanism, customers need to:
a. understand the issues at play here
b. check their risk appetite
… to determine if this is something they want to use.
Vendors also have a duty to present these pitfalls, but rarely do, so are complicit in the results. The next big buzzword always takes precedence over practical considerations and real customer benefits. Customers need to stop being so gullible in transparently accepting vendors’ marketing and pitches.
How do we prevent this?
Disaster recovery and business continuity are not only for bad stuff, but also useful for “good/known stuff”. It’s important that your recovery systems and solutions take this into account. As an example, a lot of folk found out this past week that having Bitlocker on endpoints (a good thing[tm]) and headless endpoints introduced some unique recovery challenges.
Customers need to poke through vendor marketing speak and carefully/properly evaluate products. Yes, it takes a lot of time and effort but is completely worth it in the end.
The proprietary nature of most security vendors means customers have little insight into vendor development and QA processes, so it’s up to vendors to be more transparent. That’s unlikely to happen unless they’re forced to do so – whether it’s through regulation or customer pressure, we need to improve the industry.
Et Tu, Microsoft?
It just so happened that MS had a major (and now quite common) cloud outage hours before the CS issue and as a result, was initially blamed for the CS issue even though the 2 events were completely unrelated. It didn’t help that it took MS many hours to advise everyone of such. Or that general media are singularly useless in fact-checking and continued touting the MS angle.
MS are already in the dogbox relating to the recent hacking of high-profile Office365 email accounts and this doesn’t help their cause.
But the real clincher here is that MS have just had exactly what I spoke about in the previous section: a compromise of their security stack!
From Hackernews again:
A now-patched security flaw in the Microsoft Defender SmartScreen has been exploited as part of a new campaign designed to deliver information stealers such as ACR Stealer, Lumma, and Meduza.
Fortinet FortiGuard Labs said it detected the stealer campaign targeting Spain, Thailand, and the U.S. using booby-trapped files that exploit CVE-2024-21412 (CVSS score: 8.1).
The high-severity vulnerability allows an attacker to sidestep SmartScreen protection and drop malicious payloads. Microsoft addressed this issue as part of its monthly security updates released in February 2024.
Online forums having been buzzing to and fro since Friday past, and a common theme is people suggesting to dump CS in favour of MS Defender.
Even excluding the above (as bad as it gets) issue, MS have a woeful history with security, so are we really going down this rabbit hole?
The simple fact is that the software and security industry need to do better. Knee-jerk rip-and-replace projects will not necessarily yield the results that you’re expecting.
There’s some impetus now that could help customers put pressure on vendors to both be more transparent and diligent in their processes and interactions. Will we see an improvement?
Government regulation? I hate regulation but there’s a certain amount of it required in many industries because you know, folk can’t always do right by themselves and their customers. As an aside, the EU commission mandated MS opening it’s kernel access to 3rd parties in 2009 expressly for non-competition reasons. And so allowing companies like CS to be in a position to break things. Talk about unintended consequences …
Recommendations
- when investigating new products, try to do a full POC
- look for comparative testing information
- take time to fully vet and investigate your suppliers and vendors
- include your vendors and other 3rd parties in your risk assessments and subsequent DR/BCP planning
- make sure you have all contact information (and are aware of relevant contact mechanisms) for your vendors
- make sure you’re ofay with all support procedures relating to your products
- subscribe to your vendors’ security feeds
- where possible, test and stage updates/upgrades
- follow change control if relevant
- have a backup plan
Conclusion
The IT industry (both vendors and clients) has a 2-second memory. In a couple of weeks time, the CS issue will be forgotten and everyone will carry on as they were before.
Lessons learned? (I hate that term!!!) Yes there will be some. Will it result in a sea-change? Unlikely. As in many cases, we just never learn.