CrowdStrike

July 22. 2024 0 Comments

I think everybody is aware of the chaos that CrowdStrike created in the last few days. This is truly a tragic situation because there are people who died because of that and I would like to share my point of view on all of that. But I would like to state that it is based on what I read on the internet and the situation may change.

Firstly, it does not look like there is any fault of the Microsoft. CrowdStrike installs a kernel mode driver and the code in this driver works with the highest possible privileges. If this driver has a bug, then the whole system will crash and there is nothing Microsoft can do about it. Basically, IT administrators trust CrowdStrike enough to make it part of Windows.

But why do IT administrators do it? Well, there are a lot of reasons.

Reason #1. It is a huge responsibility and creates a lot of additional work. Most companies don’t pay for any additional work required by security but they will blame an IT department in case of a security breach. It is typically much easier to convince management to hire an external company than to allocate the necessary budget to do security themselves.

Reason #2. Strict security creates inconvenience. It is much easier for an external company to push these changes than for a local IT department. If you say no to an external company they will just will simply cancel the contract and will say that they cannot work without necessary security changes. But when a local IT department creates inconvenience then management can simply force an IT department to relax security requirements.

To explain what I meant, imagine a house that has an external door without a lock. And then your daughter said to you, that the door needs a lock to protect the house from bad people. But having the lock will bring a lot of inconvenience. You will have to lock and unlock it, you must have the key with you all the time, etc.

As a result, you will probably say no to your daughter. But if you want to insure your house, one of the requirements from the insurance company will be a lock on all external doors. If you will reject this requirement then will not insure you.

Reason #3. In many cases, there is an external security audit and it is much easier to pass when the company uses one of the well-known external security companies.

Reason #4. Also, it will be easier on the company in case of a breach. The company can always state that they used the best security company money can buy and hackers were just better.

Reason #5. An external security company will cost a lot of money. But for an IT department, it is not their money. Moreover, in many cases, the ratio of risk vs reward is quite bad. Let me explain the situation.

Imagine that the IT department saves the company a million dollars. What will happen next? Probably people who work in that department will receive some relatively small bonus. Maybe a top manager will send an email to everyone praising them.

But in case of a breach, they will have a lot of extra overtime work, and perhaps somebody will be fired. It is clear that for an IT department, it is much better to delegate all responsibility to the external company and sleep tight at night

But let's return to CrowdStrike. I found these quotes on Bloomberg:

Quote #1. “For years, CrowdStrike has attacked Microsoft for allowing hackers to penetrate its systems, and Kurtz has used those lapses as a selling point for his own products”

Of course, I cannot call myself a security expert, but I know a lot about it. And the Windows is not bad itself. I and many guys in our company deployed a lot of servers that are exposed to the internet and we never had any issues or hacks.

Sure our company is not in a Fortune 500 and it is not as profitable to break but many big companies handle security by themselves and it works quite well. There are several principles to follow that are well-defined and solve most of the problems and I believe that most of the Windows hacks are just the fault of IT personnel but they of course will blame Microsoft and Windows.

But CrowdStrike blamed Microsoft and creating hype about their product. To be honest, I don’t quite understand how CrowdStrike can help defend Windows if Windows has a vulnerability. Perhaps they just do not allow infected files to execute in the first place but again Windows is quite good at this as well.

Quote #2. “CrowdStrike has tried to bash Microsoft as much as they could and they were trying to profit from it,” Charosky said. “But nobody escapes when your company is such a massive part of the world’s infrastructure. This is karma. When a company graduates from being a startup to being critical national infrastructure, it needs to behave differently, and I don’t know if CrowdStrike has gone through that transition.”

This is quite an interesting thing as well. Understandably, many procedures are quite relaxed or even completely missing in a startup company. Then a startup starts the transition to a conventional company and inevitably procedures must be set up.

How to develop a product, how to test it, how to deploy it, how to collect feedback from the customers, how to react to problems, and many other procedures. But these procedures must not be set in stone. They must evolve and adjust to a fast-changing world and business situations.

For example, when you have a single client it could be ok if the developer tests every change. But when you have a lot of clients with quite critical systems, procedures must be very different with a lot of testing. Safety and stability must be the number one priority.

And from my point of they miserably failed here. I can immediately see the following issues:

Failure #1. From what I found on the internet around 8 million computers were affected. It simply should not be possible to deploy buggy software on so many computers. They should release it in many waves and they must carefully monitor all potential issues. From what I see they didn’t do it and released it to all computers.

Failure #2. From what I read they have absolutely no way to return any affected computer back to life without some manual actions on that computer and this considerably slows down the restoration process because each computer must be restored manually.

Failure #3. This is kind of obvious but their procedures have just failed. I don’t know what happened exactly but somehow this issue went into production.

Quote #3. “CrowdStrike has done more to disrupt global business than all the ransomware operators combined,” he said. “This is a demonstration of how much risk we’re carrying with this software that we’ve deployed to protect ourselves: If these guys get it wrong, they can take your business down.”

This is exactly the point that came to my mind when I read that they have a kernel-mode driver. What I learned from tens of years of experience is that writing a kernel-mode driver is hard. There are many aspects that the developer of a driver must follow. Performance, reliability, security, power management, and many others. Moreover, you cannot just do something you need and ignore all other aspects. It is all or nothing. And you cannot just copy/paste something from Stackoverflow or ChatGPT. You must know it.

For example, some time ago Sony decided to create DRM protection that installed a kernel-level driver that could be used to get full control of the computer. Effectively it was a rootkit. This happened because the developers of that driver didn’t care about security and only cared about that task. But as I wrote above, it does not work this way.

I know that I can trust any Microsoft-written kernel code because they are experts. They also make mistakes but it never has anything even remotely close to this disaster. And this is because Microsoft has very strict procedures and tens of years of experience and as a result, they can catch problems at early stages.

Moreover, many products install a kernel-mode driver. For example, network drivers, video card drivers, and some antiviruses. But I never heard that somebody failed so badly that they bricked 8 million computers.

And I cannot say that Crowdstrike was all good and just made a single mistake. I regularly deal with strange behavior and crashes of our application and I seen Crowdstrike quite often. To be honest, I had the impression that it was written by unprofessional people. I had no idea that it was so popular with Fortune 500 companies. But to be honest, dealing with the security solutions of many top companies left a very unprofessional expression.

I feel that this Crowdstrike was just about hype and money. For example, I watched a video about the company that was created by a so-called “security expert” and he decided to target a hacker group called Anonymous.

They hacked him because he used a single password in some online game, for his business email account, and for many other accounts. And many government agencies trusted him with security protection even though this is a basic first rule of any security: never reuse business passwords. So the more I read about this issue there more it looks like this company.

In conclusion, it is clear that having a single security company is quite bad because it became a single point of failure. Keep in mind that this issue affected only a small percentage of computers with Crowdstrike and it is very easy to restore each computer. It is just a lot of computers. But I’m pretty sure everything will be fine within one week or so.

But imagine what would happen if this would be an intentional attack that leaks and destroys data. It will be a true world disaster. I hope people will learn something from the incident and will make the right conclusions.