With all the disasters of recent, 8.9 earthquake and subsequent tsunami causing widespread disruption in Japan, the floods in Queensland disrupting business operations, the floods in regional Victoria and Bushfires at Kinglake and surrounding suburbs has left many companies re-thinking the strategy on business continuity beyond the traditional IT data centres and co-location.
Many companies see business continuity as the means to spread IT systems across 2 locations within the same regional area, allowing operations to continue from the office. However, over the past 18 months, we have seen that continuing operations from the office is flawed. Many people could not get into the office during the disaster periods, and as such many businesses suffered.
Protecting your organisation from unplanned downtime is widely dependent on building redundancy and diversity directly into your disaster recovery and business continuity systems. Business systems need to be able to run on a number of different infrastructures and be able to fail over between them quickly and efficiently as necessary.
When moving to the cloud, we often see many organisations make the same mistake as they do using traditional data centres, they implement business continuity and disaster recovery in one region. This leads to the same issues that are apparent in the flawed traditional architecture distributions. The cloud offers far more than just hosting, it allows for distribution across multiple regions, all over the world, reducing the impact organisations have during disaster. A good example of such, is the recent failure of an EBS system in Amazons Eastern US region. This saw many companies including, Foursquare and Reddit without systems.
Our customers within base2Services did not feel the impact. Why, because we have our customers distributed across multiple regions or ready for automatic launch of systems in other regions. This significantly reduced the impact on our customers by keeping operations running without disruption.
The key is design your infrastructure for the possibility of failure. Any system can be designed for protection against failure, no matter how complex. There is no magic bullet to this, but a general approach does exist where by automation in the cloud can be leveraged. Start by determining each of the layers of your applications and infrastructure individually, find the components that can withstand failure and what parts can respond automatically to failure. Next build the failure recovery into the solution. Once this is done, make sure that you continual to maintain the failure components as you would the actively operating components. Never let them fall behind, otherwise you would not be able to recover easily.