Skip to content

The CrowdStrike incident may end up being one of those “Where were you when…?” moments. Newscasters and IT experts have labelled it as the largest disruption of this type on a global scale that they had ever seen. In the coming weeks and months there will be in-depth analysis of the impact for individual organizations, sectors, economies, and of course people across the globe.

The incident raises a number of potential actions for organizations about how they might handle such situations in future, and what they might learn, which we have summarized here:

  • Third-party risk management: Continuously assess and monitor third-party risks, integrating lessons from incidents like CrowdStrike to maintain robust risk management and consider diversification to mitigate concentration risks.
  • Operational resilience and business continuity management: Strengthen and regularly test business continuity plans, focusing on adaptability and effective stakeholder decision-making during disruptions, and ensure robust communication strategies with customers and staff.
  • Incident reporting: Clarify, streamline, and regularly update incident classification and reporting processes to ensure compliance with evolving regulatory standards and maintain transparency with regulatory bodies.

Let’s now look at a summary of what actually happened, and then we can explore the issues in more depth.

Subscribe to our knowledge hub to get practical resources, eBooks, webinar invites and more showing the latest developments in risk, resilience and compliance, direct to your inbox:

Subscribe now

What happened?

Without getting into a detailed technical analysis, an estimated 8.5 million devices were affected by the CrowdStrike issue. CrowdStrike’s Falcon cybersecurity software was installed onto those devices. Falcon updates multiple times a day without requiring permission, in order to maintain the most up to date protection. A corrupted software update disabled many devices running Windows, which either could not be restarted or went into a loop of continually restarting. CrowdStrike released a fix within hours, but this required a manual process to install, resulting in many IT teams working overtime to restore their devices.

Third party risk management

You can outsource a function, but you can’t outsource the risk – which means you have to manage the risk from your third parties. For some, CrowdStrike might be a fourth party or beyond, perhaps through use of managed service providers. Third party risk management is of course essential. A natural reaction after an incident involving a third party is to review any specific risk assessments, as well as the overall process. Boards or executives might be asking questions like ‘Did we identify this as part of our third-party risk assessment?’

Of course an incident doesn’t mean your initial risk assessment, or a decision to proceed, was incorrect. You organization may well have identified the risks, or it might update them based on findings from this event. Should organizations be dropping CrowdStrike as a result of this incident? Your organization will have to make their own assessment, but consider these points:

  • CrowdStrike are one of the largest cybersecurity firms in the world
  • They identified and communicated their awareness of the issue quickly
  • They have delivered a huge number of updates successfully over their product lifecycle
  • They have a large swathe of compliance certifications
  • Whatever further analysis identifies about their testing and rollout practices, the scrutiny they are now under will likely make those processes more robust

These are assumptions rather than assertions, tempered by the global impact felt, open questions about whether communication could have been more compassionate to those affected, and acknowledgment that achieving compliance with certifications is not the same as effective risk management.

You could move to another provider, or perhaps diversify some of your technology – but will the risks be significantly different? What is the cost of making a switch, and what risks might be involved in making the change? We’re not advocating for CrowdStrike, just that you need to take a measured response and look at the future risk profile, not just past incidents. No doubt regulators and lawmakers will be investigating how to address concentration risk at a broader scale, but this can be difficult or impossible for an individual organization to consider.

You can’t predict or prevent all incidents from third parties, meaning you need to be prepared.

Operational resilience and business continuity management

For those that were disrupted, their business continuity plans, and more generally their capability to respond, will have been put to the test. What probably matters most here is the capability of key stakeholders to assess the situation and make decisions – not what is written down in a business continuity plan (which for some may not have been accessible). Resilience includes the ability to adapt and manage through disruption, not just sticking to a pre-defined plan.

I look forward to the meta-analysis I’m sure we can expect from industry groups in the coming months about the success or otherwise of how different organizations responded – what worked and what didn’t? Here are some questions organizations either are or should be asking themselves:

  • Is the distinction between business continuity and disaster recovery clear? Was this considered an ‘IT problem’ in your organization? Did the rest of the organization kick into gear to continue to provide its most critical services via alternate means?
  • Did we treat this differently because it was via a third party and ‘outside our control’? Or did we recognize the impact to our customers regardless and adapt accordingly?
  • How was communication with customers handled? Could we identify the customers who are most likely to be impacted? Were staff equipped to handle customer enquiries? What could have been improved?

You can’t know or imagine every single scenario, but exercising your capability will help build the resilience muscle in your organization. The best way to develop those scenarios is to understand your operations well. This includes mapping the processes and resources required to deliver, starting with the customer in mind.

Which of those resources, if they were unavailable, might cause the most disruption? If they are disrupted, what can you do to respond? For some, there might be simple workarounds. For others, you might need to prepare in advance, such as having alternative suppliers and contractual arrangements in place. Your business continuity plans should consider what actions and sequence of events are required in order to restore services or engage those alternative arrangements, and responsibilities to activate them.

A challenge with business continuity plans is that they don’t exist in a vacuum. You should assess the risks to the business continuity plan itself – in what circumstances would it fail, and how are you addressing them? This is where effective exercising comes in – the point is not simply to ‘pass’ a test, but to assess and improve the operational resilience of your organization. After an effective test, consider what variations might cause it to be ineffective. In particular, consider whether the scenario you’ve created for your organization could equally be affecting others, either the entire industry or those that would support recovery.

There may be trade-offs you need to make. An improvement in resilience may come with a cost and an impact on the bottom line, or even reduce efficiency of ongoing operations.

Is this a reportable incident?

One final area I think will generate discussion – what is a cybersecurity incident and who needs to report what? Across jurisdictions there are incident reporting rules that may apply, but do they apply in this case?

The SEC’s cybersecurity reporting rules came into effect in 2023[1]. Their definition:

“Cybersecurity incident means an unauthorized occurrence, or a series of related unauthorized occurrences, on or conducted through a registrant’s information systems that jeopardizes the confidentiality, integrity, or availability of a registrant’s information systems or any information residing therein.”

It seems like for organizations impacted, this does not apply, on the assumption that CrowdStrike were authorized to make the changes, even if the effect was unwanted. Nor does it apply to CrowdStrike themselves, as there was no malicious activity here.

In the UK, the FCA’s Principles for Businesses under Principle 11 state:

A firm must deal with its regulators in an open and cooperative way, and must disclose to the FCA appropriately anything relating to the firm of which that regulator would reasonably expect notice.

They confirm that material cyber incidents must be reported in a separate info-graphic, and that they may be material if it that a firm must report material cyber events, which is defined as when a cyber attack[2]:

Results in significant loss of data, or the availability or control of your IT systems; affects a large number of customers; results in unauthorized access to, or malicious software present on, your information and communication systems.

The first two conditions certainly could be argued to have been met here, but it wasn’t a cyber attack. That said, if firms experienced material disruption to important business services, it might still meet the broad requirements of Principle 11.

The EU’s Digital Operational Resilience Act (DORA) is not yet fully in effect – but when it is, financial institutions may need to report ‘ICT-related incidents’ to supervisors under Article 19[3]. They are defined as:

“a single event or a series of linked events unplanned by the financial entity that compromises the security of the network and information systems, and have an adverse impact on the availability, authenticity, integrity or confidentiality of data, or on the services provided by the financial entity”.

It certainly was unplanned by the financial entities and has an adverse impact on availability – though interpretation of the nested definition for security of network and information systems (which includes the concept of resisting) might preclude them. This is just one of many articles in DORA that would be at play, and I’m sure it has kicked over many stones for those currently implementing change to meet these requirements.

To be clear, I’m not 100% confident in any of the above interpretations. I’m sure this incident may raise further questions about their application.

Beyond the purely regulatory arena, we expect some terms, particularly regarding how people define ‘security’ and ‘availability’ (and interpret the relationship between them), and the difference between cybersecurity and information security will generate robust discussion. This will be especially true where they have potential implications for insurance and contractual clauses, which lawyers will already be poring over.

Conclusions and next steps for your organization

There will be plenty of more comprehensive deep dives and analysis to digest in the coming months. Here are a few immediate take-aways to consider:

  • Good third-party risk management needs to be part of your overall operational resilience. Review your processes, but acknowledge that you can’t prevent every possible risk
  • Develop a strong understanding of your business, using learnings from the event (your organization's and what you can learn from observing others) to improve both your business continuity plans, as well as the response capability of your people
  • To the extent needed, review how you classify incidents. Even if you had no need to report, was it considered, or did it just pass you by?

If you’re interested in finding out more about incident management and how responsibility sits between risk and IT practitioners, you can view our Integrating IT and enterprise incident frameworks webinar on demand:

View webinar on demand

 

References

[1] https://www.law.cornell.edu/cfr/text/17/229.106

[2] https://www.fca.org.uk/publication/documents/cyber-security-infographic.pdf

[3] https://www.dora-info.eu/dora/article-17/

 

Image credit: Robert - stock.adobe.com

About the author

Michael is passionate about the field of risk management and related disciplines, with a focus on helping organisations succeed using a ‘decisions eyes wide open’ approach. His experience includes managing risk functions, assurance programs, policy management, corporate insurance, and compliance. He is a Certified Practicing Risk Manager whose curiosity drives his approach to challenge the status quo and look for innovative solutions.