AWS Public Sector Blog

Rapidly recover mission-critical systems in a disaster

Updated June 2021 with additional resources.

Due to common hardware and software failures, human errors, and natural phenomena, disasters are inevitable, but IT infrastructure loss shouldn’t be. With the AWS cloud, you can rapidly recover mission-critical systems while optimizing your disaster recovery (DR) budget.

Thousands of public sector customers, like St Luke’s Anglican School in Australia and the City of Asheville in North Carolina, rely on AWS to enable faster recovery of their on-premises IT systems without unnecessary hardware, power, bandwidth, cooling, space, and administration costs associated with managing duplicate data centers for DR.

The AWS Cloud lets you back up, store, and recover IT systems in seconds by supporting popular DR approaches from simple backups to hot standby solutions that failover at a moment’s notice. With multiple regions and Availability Zones (AZs), you can recover from disasters anywhere, any time. The following figure shows a spectrum for the four scenarios, arranged by how quickly a system can be available to users after a DR event.

These four scenarios include:

  1. Backup and restore. This simple and low cost DR approach backs up your data and applications from anywhere to the AWS cloud for use during recovery from a disaster. Unlike conventional backup methods, data is not backed up to tape. Amazon Elastic Compute Cloud (Amazon EC2) computing instances are only used as needed for testing. With Amazon Simple Storage Service (Amazon S3), storage costs are as low as $0.015/GB stored for infrequent access. Learn more about backup and restore.
  2. Pilot light. The idea of the pilot light is an analogy that comes from gas heating. In that scenario, a small flame that’s always on can quickly ignite the entire furnace to heat up a house. In this DR approach, you simply replicate part of your IT structure for a limited set of core services so that the AWS cloud environment seamlessly takes over in the event of a disaster. A small part of your infrastructure is always running simultaneously syncing mutable data (as databases or documents), while other parts of your infrastructure are switched off and used only during testing. Unlike a backup and recovery approach, you must ensure that your most critical core elements are already configured and running in AWS (the pilot light). When the time comes for recovery, you can rapidly provision a full-scale production environment around the critical core. Learn more about pilot light.
  3. Warm standby. The term warm standby is used to describe a DR scenario in which a scaled-down version of a fully functional environment is always running in the cloud. A warm standby solution extends the pilot light elements and preparation. It further decreases the recovery time because some services are always running. By identifying your business-critical systems, you can fully duplicate these systems on AWS and have them always on. Learn more about warm standby.
  4. Multi-site. A multi-site solution runs on AWS as well as on your existing on-site infrastructure in an active- active configuration. The data replication method that you employ will be determined by the recovery point that you choose, either Recovery Time Objective (the maximum allowable downtime before degraded operations are restored) or Recovery Point Objective (the maximum allowable time window whereby you will accept the loss of transactions during the DR process).

Learn more about disaster backup and recovery approaches, read the whitepaper “Disaster Recovery of Workloads on AWS: Recovery in the Cloud” and the eBook “Maintain business continuity during unexpected events,” watch our webinar on disaster recovery, and contact us.

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.