
CPS 230 requires regulated entities to consider service disruption from a different perspective. Working backwards through a scenario, entities must identify the harm that a disruption may cause to its customers or the broader financial system, then take active measures to prevent it (operational risk) and recover from it (operational resilience).
Welcome to the third in our series of CPS 230 technical guides.
In the discussion paper that accompanied the issue of draft CPS 230, APRA noted that one of its key objectives is to focus the Board on the importance of operational resilience through requiring the setting of tolerance levels for disruptions to critical operations. Although the approach to set the tolerances should leverage that for setting risk appetite tolerances, the fundamental difference is that when considering operational resilience, the risk is crystallised.
In this guide we set out an approach to assessing operational resilience. The approach leverages the methodology developed by Grant Thornton in the UK where similar requirements have been in place for some time.
Why is operational resilience important?
A robust and resilient financial services sector is essential to preventing financial harm. CPS 230, with its focus on operational resilience, is consistent with prudential requirements in the UK and Europe. It forms part of a suite of APRA requirements related to limiting financial harm due to disruption, including identifying domestic systemically important banks (D-SIBs), CPS 232 Business Continuity Management, CPS 190 Recovery and Exit Planning and multiple capital adequacy and liquidity requirements.
Operational resilience refers to the collective steps an entity takes to minimise the impact and disruption of operational risk incidents. Business continuity and business resilience aim to keep the entity as a whole operating. Operational resilience is related but differs in that the focus is not on the entity as a whole, but the key financial services it delivers.
Although APRA accepts that some degree of service disruption and outages will occur, it is important that regulated entities:
- Have the resilience to get critical operations back up and running without causing financial harm;
- Work within a pre-defined tolerance level that aligns with their broader risk appetite; and
- Conduct robust scenario testing, using extreme but plausible scenarios, to assess whether it is possible to remain within the tolerances set.
The Board is expected to oversee and approve all aspects of operational resilience. As such, risk reporting and Board Risk Committee Charters may need to be updated to include information necessary to facilitate this. Operational resilience will also need to be reflected in risk management declarations.
Identifying critical operations
CPS 230 defines critical operations as processes that:
“If disrupted beyond tolerance levels would have a material adverse impact on its depositors, policyholders, beneficiaries or other customers or its role in the financial system.”
CPS 230 sets out the processes that it expects at a minimum to be identified as critical operations.
At its core, CPS 230 requires regulated entities to prioritise critical services over their own operational objectives to prevent financial harm to consumers. This means, for example, that in the event of a major disruption, APRA expects that priority will be given to restoring core banking operations over other revenue-generating non-regulated businesses.
Resilience planning
The following diagram sets out the steps necessary for effective resilience planning and key considerations:
The necessary steps can be summarised as:
Activity |
Detail |
Identify |
For each critical process determine how much disruption could be tolerated and under what circumstances. This will require contingency and continuity planning, including identifying back-up or substitute systems, processes and service providers. |
Map |
Document the systems and workflows that support each critical process including activities undertaken by related and non-related service providers. Interdependencies between systems and processes must be identified so that the total impact of any disruption can be assessed. |
Assess |
Determine how the failure of a system, workflow or service provider would impact a critical process. Concentration of critical service providers may increase the impact. Contingency plans must address the disruption and identify potential substitutions. |
Test |
Use severe but plausible scenarios and past experience (for example, COVID) to test that the resilience of each critical process is within tolerance should a disruption occur. Generating scenarios will require involvement from IT, the business, risk and third-party service providers. Testing plans should consider the type and frequency of testing. |
Invest |
Where the resilience is below tolerance, the capacity to respond and recover from disruptions must be enhanced. The focus of enhancements should be to reduce the overall recovery time. |
Communicate |
Identify all internal and external stakeholders, what needs to be communicated, to whom and when. The overall objective of the communications is to enable customers to make informed decisions in the event of an outage. |
The steps will need to be undertaken on a continuous basis to take account of emerging risks, the results of testing and any disruptions that may occur.
