Skip to content

Operational Resilience Series #6: Identifying vulnerabilities and actions

In this series we’ve defined important business servicesdesigned impact tolerances and mapped the processes and resources that support them. In our most recent blog, we ran through testing via scenarios to determine whether we could achieve defined impact tolerances.

But what can we learn from the scenario testing process, and how do we take follow up action?

In this blog we will cover:

  • How lessons can be learned
  • Types of vulnerabilities that might be identified
  • Ways to improve ability to meet impact tolerance
  • Who needs to know?

How lessons can be learned

Let’s look at the main sources of learnings, which can include those outside the testing cycle.

Scenario testing

The most obvious source is through the scenario testing that we described in our previous blog (otherwise why are we doing them?). If you failed to achieve your impact tolerance based on the scenario or scenarios that were tested, that is an immediate driver for action. Even if you met your impact tolerance, you may still identify improvements to improve your resilience and recovery of your important business services.

Regulated entities may be required to conduct lessons-learned exercises following their scenario testing.

Horizon scanning

Horizon scanning, monitoring incidents in the news, or information shared directly from other organisations or industry bodies in your sector can also provide insights that can be adopted into your operational resilience program. Weaknesses or interdependencies identified in other organisations may shed light on similar weaknesses that you had not identified in your own.

Controls assurance, metrics and attestations

You may have identified controls that support ongoing availability of resources or ability to perform processes, metrics that monitor the health of resources, or attestations that support ongoing resilience.

Monitoring these activities may identify changes in the health of resources and provide some insight into vulnerabilities that need to be addressed.

Incidents

Incidents that provide real data about your ability to meet your impact tolerances (whether breached or not) may not be welcomed with open arms, but they do provide real information about your ability to withstand and recover. Minor incidents or near misses, which are usually more frequent, can also provide learnings. The most likely findings here are dependencies or complex interactions that may have been overlooked.

Regulated entities may also be required to conduct lessons-learned exercises following an incident.

Types of vulnerabilities that might be identified

While every organisation and learning opportunity is different, there are some types of weaknesses that can be identified more readily than others. This can serve as a guide for the things that you should look and ensure they are not overlooked. We recommend including a structured process or having a checklist to capture these as part of your post-scenario testing learnings.

  • Identification of resource dependencies that were not previously identified, including the availability of data sources
  • Crisis and communication plans that were not sufficient to enable swift activation of business continuity plans or other contingencies
  • Vendors or third parties that did not respond in the time we anticipated, or did not act in a way that was conducive to recovery
  • Key person risks

These may be documented as vulnerabilities; areas where either your resources or responses weren’t sufficient to achieve desired outcomes. Remember that threats are external and are associated with scenarios; vulnerabilities are internal and focus on the quality of resources and capabilities.

Ways to improve ability to meet impact tolerance

Once you’ve identified and documented your vulnerabilities, you need to take action to address them. Again, this will differ by organisation and the vulnerabilities identified, but may include:

  • Changing business continuity plans or contingency plans
  • Diversification of resources – particularly where one resource supports multiple services and recovering multiple at once within tolerance may not be achievable
  • Increasing redundancies and backups
  • Increasing the volume or capacity of one or more resources
  • Training on roles and responsibilities
  • Enhancing relationships or service level agreements with third party suppliers

The aim of any action is to ensure you can remain within your impact tolerance. When weighing up alternatives, remember to compare them against your existing scenarios; make sure you aren’t introducing new vulnerabilities!

Should scenario results inform changes to your impact tolerances?

Impact tolerances are based on an assessment of potential harm or impact when considering external factors. The time in which you were able to restore services during testing should not influence changing your impact tolerance. Impact tolerances should only be changed on the basis that you have learned more about the potential impact.

Who needs to know?

It’s obvious that once you’ve learned something, you need to make the necessary improvements. That means someone has to own them, and this will be informed by the governance of your operational resilience program and its integration with other framework or teams, such as your operational risk, cyber or vendor frameworks. We recommend having a centralised system so that actions can be tracked back other elements of those programs.

The lessons aren’t really learned unless they are embedded, so it’s important to share these learnings with others and raise awareness.

And finally, management, executives and boards will need a summarised version.

About this series

We’ve covered a lot of the ‘doing’ of Operational Resilience so far in this series. Next, we will explore the reporting that management want to see when it comes to operational resilience, and how it gives them insight into the operational resilience of their organisation, and the performance of their operational resilience programs.

Next steps for your organisation

Protecht recently launched the Protecht.ERM Operational Resilience module, which
helps you identify and manage potential disruption so you can provide the critical
services your customers and community rely on.

Find out more about operational resilience and how Protecht.ERM can help:

 

Note on regulation and terminology

While this series primarily discusses regulated entities, the guidance can apply to any organisation seeking to improve their operational resilience by looking through an external stakeholder lens, whether they operate in financial services, critical infrastructure, healthcare or indeed any other industry.

We use the term ‘important business services’, which aligns with the UK’s Financial Conduct Authority/Prudential Regulation Authority terminology but can and should be adapted to different regions and sectors. For Australian financial service providers, we recommend replacing ‘important business services’ with ‘critical operations’, and impact tolerance with ‘tolerance levels’ to align with APRA draft standard CPS 230 on Operational Risk.

We use the term ‘customer’ in this blog, which can include direct consumers, business to business relationships, patients in health care settings, or recipients of government services. The defining factor is that they are external recipients of the services you provide.

About the author

Michael is passionate about the field of risk management and related disciplines, with a focus on helping organisations succeed using a ‘decisions eyes wide open’ approach. His experience includes managing risk functions, assurance programs, policy management, corporate insurance, and compliance. He is a Certified Practicing Risk Manager whose curiosity drives his approach to challenge the status quo and look for innovative solutions.