27/02/2015 Defining your RTOs
This week Charlie highlights to Business Continuity Managers the importance of Recovery Time Objectives (RTOs).
Defining the RTOs of your activities, I believe is one of the most critical activities the Business Continuity Manager will carry out. Get them wrong and the whole basis for your business continuity recovery is flawed. Often the RTO can be driven by internal politics and by managers wanting to see their part of the organisation, and hence themselves, to be seen as important rather than an objective assessment.
The recovery time objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
For a long while, I have wondered if there was any “scientific” way or even a rule of thumb for defining your RTOs but I have never come across one. A while ago I went to the LinkedIn Group BCMIX to ask them how they went about defining their RTOs. I got lots of explanations of the process for defining them but no set rule. Most people said defining RTOs was a combination of common sense, knowledge of the organisation and experience. These are all very good but how is a beginner going to get that experience….
The first step for defining the activity RTOs is to define the Maximum Tolerable Period of Disruption (MTPD) for the activity. For those of you not familiar with MTPD it is a term used in the BCI’s Good Practice Guidelines 2013, which is defined as “the time it would take for adverse impacts, which might arise as a result of not providing a product/service or performing an activity, to become unacceptable”. There is an equivalent concept outlined in ISO22301 in Section 8.2.2 where it asks you to take into consideration “the time within which the impacts of not resuming them would become unacceptable”.
In setting the MTPD you are looking for the “… duration after which an organisation’s viability will be irrevocably threatened....”. I think people struggle with the concept of the MTPD and defining it. For most organisations this is very difficult to define precisely. There are very few organisations that could survive say 30 days but not 31 days. On a slow news day, an organisation could be number one item in the news and its reputation may be irretrievably damaged while on another day it might not even be picked up by the media. There is also the issue that some organisations, especially government organisations, may not be allowed to fail or when a small part of a multinational fails, it is very unlikely to cause the demise of the whole organisation. So in defining your MTPD we are looking for a “ball park” time at which the organisation's viability is irretrievably threatened.
As we can see above, ‘irretrievably threatened’ means different things to different organisations. Before defining the MTPD it is important to agree what the criteria for the organisation would be. This could range from:
1. Reputation being irretrievably damaged and unable to sell to new customers
2. Running out of money and the organisation going bankrupt
3. The entire management team being replaced or the organisation being closed down or run by another part of the organisation
4. Failure of delivery of service to customers, existing customers leaving the organisation and being unable to attract new ones
5. A genuine chance of the organisation killing or injuring a member of the public
--- The importance of these criteria is that they have to be tailored to the organisation.
The MTPD of an activity is an estimation that should sit within a rough time frame, say 1-3 months or 2 days - 2 weeks. Depending on the type of organisation I normally define six or seven timescales for the MTPD to fall within. For example, for an “office-based organisation” which typically works 9am to 5pm five days a week, I would use the following timescales to see where the MTPD would fall:
- 0-24 hours
- 24 hours to 3 days
- 3 days to 1 week
- 1 week to 2 weeks
- 3 weeks to 1 month
- 1 month to 3 months
For an organisation which works 24/7, such as a hospital, I would change the timescales to perhaps:
- 0-1 hours
- 1 hour to 6 hours
- 6 hours to 24 hours
- 1 day to 3 days
- 1 week to 2 weeks
- 4 weeks to 1 month
- 1 month+
Even if the MTPD of an activity is a rough estimation, we can still use the MTPD to differentiate between different activities. Some activities will have a longer MTPD than others and this starts to give us an indication of which activities are the first to be recovered and which can wait a while. It is important that if you are going to use MTPD to define your RTO, that you use the same criteria each time to define your RTO. If you use different scenarios for ascertaining the MTPD for different activities then it will very much skew your findings. I use the phrase “if the entire activity was to stop”, disregarding any particular scenario when the failure of that activity led to the organisation being irretrievably damaged.
Once we have the MTPD of all our activities, we are in a position to start to estimate our RTO. I say here, estimate, as the RTO may change slightly after the design (strategy) stage of the business continuity lifecycle. In looking at an appropriate strategy to recover the RTO for practical or financial reasons you may have to adjust your RTO. In deciding on the RTO we know that the RTO must sit somewhere on a timeline before the start of the incident and the MTPD. See below:
If the RTO is too close to the incident then we might be recovering the activity too quickly and therefore wasting money. As an extreme example - an activity, which has an MTPD of 1 month plus, does not need an RTO of 2 hours. On the other hand, if the RTO is too close to the MTPD then the RTO might be missed and there is no time for recovery before the MTPD and so the organisation may be irretrievably damaged.
I have come up with my rule of thirds for defining the RTO. The RTO of our activity sits in the middle third of the time line. As shown by the green of the diagram below.
By way of an example, as shown in Figure 3, the MTPDs is in the time frame of 3 weeks to 1 month. We take the lowest time of the MTPD timeframe (3 weeks) and then divide the timeline into 3 parts. The RTO sits somewhere between 1 week and 2 weeks. It is then up to your judgement whether it sits closer to 1 week/8 days or closer to the 2 weeks/12 days or it could sit in the middle.
If the MTPD of a different activity were within the time bracket of 6 to 12 hours then the RTO would be sometime between 2 and 4 hours.
Once we have decided on the RTO then we should go forward to the “design" stage of the business continuity lifecycle. Within the design stage we look at and then agree, the recovery strategy for that activity. It must be noted that the RTO may change in the design stage as it may have to be adjusted to make a particular strategy work. Once the RTO has “gone firm” I believe that all the RTOs of the activities across the organisation should be signed off by a senior manager. This to check that they meet the requirements of the organisation and are acceptable to the senior managers of the organisation.
I have taught the rule of thirds on a number of courses and I am interested if there are any comments on it!