AWS Government, Education, & Nonprofits Blog

Rapidly Recover Mission-Critical Systems in a Disaster

Due to common hardware and software failures, human errors, and natural phenomena, disasters are inevitable, but IT infrastructure loss shouldn’t be.  With the AWS cloud, you can rapidly recover mission-critical systems while optimizing your Disaster Recovery (DR) budget.

Thousands of public sector customers, like St Luke’s Anglican School in Australia and the City of Asheville in North Carolina, rely on AWS to enable faster recovery of their on-premises IT systems without unnecessary hardware, power, bandwidth, cooling, space, and administration costs associated with managing duplicate data centers for DR.

The AWS cloud lets you back up, store, and recover IT systems in seconds by supporting popular DR approaches from simple backups to hot standby solutions that failover at a moment’s notice. And with 12 regions (and 5 more coming this year!) and multiple AWS Availability Zones (AZs), you can recover from disasters anywhere, any time. The following figure shows a spectrum for the four scenarios, arranged by how quickly a system can be available to users after a DR event.

These four scenarios include:

  1. Backup and Restore – This simple and low cost DR approach backs up your data and applications from anywhere to the AWS cloud for use during recovery from a disaster. Unlike conventional backup methods, data is not backed up to tape. Amazon Elastic Compute Cloud (Amazon EC2) computing instances are only used as needed for testing. With Amazon Simple Storage Service (Amazon S3), storage costs are as low as $0.015/GB stored for infrequent access.
  2. Pilot Light – The idea of the pilot light is an analogy that comes from gas heating. In that scenario, a small flame that’s always on can quickly ignite the entire furnace to heat up a house. In this DR approach, you simply replicate part of your IT structure for a limited set of core services so that the AWS cloud environment seamlessly takes over in the event of a disaster. A small part of your infrastructure is always running simultaneously syncing mutable data (as databases or documents), while other parts of your infrastructure are switched off and used only during testing. Unlike a backup and recovery approach, you must ensure that your most critical core elements are already configured and running in AWS (the pilot light). When the time comes for recovery, you can rapidly provision a full-scale production environment around the critical core.
  3. Warm Standby – The term warm standby is used to describe a DR scenario in which a scaled-down version of a fully functional environment is always running in the cloud. A warm standby solution extends the pilot light elements and preparation. It further decreases the recovery time because some services are always running. By identifying your business-critical systems, you can fully duplicate these systems on AWS and have them always on.
  4. Multi-Site – A multi-site solution runs on AWS as well as on your existing on-site infrastructure in an active- active configuration. The data replication method that you employ will be determined by the recovery point that you choose, either Recovery Time Objective (the maximum allowable downtime before degraded operations are restored) or Recovery Point Objective (the maximum allowable time window whereby you will accept the loss of transactions during the DR process).

Learn more about using AWS for DR in this white paper. And also continue to learn about backup and restore architectures, both using partner products and solutions, that assist in backup, recovery, DR, and continuity of operations (COOP) at the AWS Public Sector Summit in Washington, DC on June 20-21, 2016. Learn more about the complimentary event and register here.