Business Continuity Awareness Week 2011 webcast by Charlie
Business Impact analysis – A new methodology to define your critical activities, MTPD and RTOs
Many business continuity parishioners struggle with their BIA making sure it is aligned to the needs to the business, making sure that it is objective and that meets all the requirements of the relevant business continuity standards they are working to. Too often BIAs contain lots of information but it doesn’t really help in the planning process, parts of the organisation demand that they are recovered within 2-3 hours and they want their IT applications NOW, when logic says they can wait for 24 hours or more and nobody can agree on the list of critical activities across the organisation. Charlie believes that a robust list of critical activates each with an agreed RTO (recovery time objective) is the cornerstone of developing business continuity within an organisation. During the webinar Charlie will explain the relevant terms such as RTO and will give a simple and robust methodology for developing consistency, relevance and ensure that the results of the BIA are aligned to the requirements of the organisation. This methodology has be used for large and small organisations and by a number of organisations as part of their BS25999 certification. Link here
It’s ok we have a standby generator……

Many organisation protect themselves against power cuts by use of a standby generator. for their data centres they will use a UPS (uninterruptable power supply)combined with a standby generator. Often these can provide false comfort as they don’t actually work on the day for a variety of reasons, ranging from the generator being unable support the whole building ‘on load’ as the testing was done at the weekend, to the UPS and the generator not being configured properly and both failed simultaneously.
I have heard a statistic, which may or may not be true, that 40% of standby generators fail to work when required. The lesson is that if you rely on a standby generator or UPS you need to do lots of testing to make sure it will actually work on the day.
Sarah Armstrong-Smith sent me an excellent example of this…
“This happened about 7 years ago.There was a power cut affecting the local region where one of our primary Data Centres was situated. The UPS and generator kicked in as they should have. However, the generator started kicking out plumes of thick black smoke.
It wasn’t long before the Police & Environmental Health were at the door ‘ordering’ us to turn the generator off as the smoke was going over a motorway and causing danger to vehicles. We begged and pleaded and told them this was our Data Centre but this fell on deaf ears…so we agreed to start turning off non-essential services to try and reduce the load…off went all pre-production and test environments. It made a little bit of difference but not enough.
We thought about invoking DR for some of the systems, but this is back in the day of tape based recovery so it would have taken the best part of a day+ to recover services, so we closed down a few more services until we were left with the really critical ones!
Meanwhile we were frantically trying to contact the Electricity company to find out what had happened and when power would be restored, to which the response was ‘we have Engineers working on it’….not a very helpful response!
At the point where we had no choice but to shut the generator down and close the Data Centre…. the power came back up! Can you believe it! So we thought we had a lucky escape until lots of complaints started flooding through from local residents…the black smoke had covered their houses in soot, gone through the windows on to furniture etc. Not to mention the noise and smell. So that started a flood of compensation claims…
As you can imagine, the media picked up the story and were not very sympathetic!
The key lessons learned here are to really test your generators properly and on load. Most companies just switch it over for a few minutes to check it fires up and then turn it off again….which we were guilty of, but we thought we were smug in the knowledge that we had a generator and tested it!
Also, given the fact that we were in the middle of a residential area, I suspect that the noise alone coming from the generator would have been enough to force us to turn it off at night time…meaning that we probably would have no choice but to either invoke DR or stop running a 24*7 operation…. the cost of which would have been disastrous!
After the event, the strategy was reviewed including relocation of the Data Centre, but this proved too costly so our over reliance on generators had to be reviewed instead including improving DR plans and the ability to move services to other Data Centres. I think this is what kick-started our use of data replication to speed up recovery”!
Disaster Recovery – What a Business Continuity Manager should know about their organisation’s IT systems

I am just about to go down and see a clients IT department and was preparing a list of questions to ask them. I thought I would share it with the readers of this blog.
These are the questions I think all Business Continuity Managers (BCM) should know about their own IT systems. I believe you don’t need to know the finer details and how the technology works but you need a good understanding of the following points.
Data centres and IT hardware
- Where your main data centre or data centres are physically located.
- Is there anywhere else data is stored such as local servers (team and individual drives and e-mail servers) collocated with its users or servers which serve all the people in one building.
- If you have a data centre and a back data centre do they have the same capacity or what is the ratio of live to back-up
- If two (or more) data centres are mirrored or employ visualisation over the two sites how good is the network between the two and how much data could be lost if one data centre was lost
- Are there any known risks to the data centres or are they located in a risky area
- What has been done to protect them against power failure
- Are they manned 24 hours or do they have alarms on them to warn staff of a bust pipes or the centre overheating
- If VOIP telephony is used, where are the servers located and what capacity could be lost under different disaster scenarios
- If cloud computing is used, where is the location of the cloud data centre(s), which companies are involved in the running of the data centres and what are the backup plans and data loss if a data centre is lost
- Are there third party contracts for disaster recovery and what do they cover. Is there regular testing of the provision
Network
- Ask for a network diagram and look at single points of failure
- Is the network in a loop enabling data to feed both ways or is the network a single strand
- Look for locations which house nodes on the network which if lost would cause the network to be lost at other connected locations as well
Backup and restoring
- As part of the understanding the organisation process the critical systems for the organisations should be established
- For each of the systems the backup regime should be known by the BCM
- The present Recovery Point Objective (RPO) should be known for each system. This is the amount of data which could be lost under a catastrophic failure and having to restore the system from the backup. This can vary from days and weeks if you don’t back up regularly, 24 hours if your backup is nightly tape back up, to no data loss if mirroring and other technologies is used.
- The time taken to restore systems under catastrophic failure / worst case scenario should be known and the order given of system recovery. This is looking at the loss of a data centre rather than the loss of individual systems.
- Recovery of individual systems should be known if they are critical to the organisation or they underpin activities with short RTOs (Recovery Time Objectives).
IT department’s plans
- Does the IT department have disaster recovery plans in place and what do they cover
- Are their plans purely technical or do they cover incident management and decision making
- How often are the plans tested and to what level do they test them
If you can think of any other questions I am happy to add them to the list.


