AWS Public Sector Blog
5 best practices for resiliency planning using AWS
Organizations face a host of threats to business continuity, from extreme weather events to cyber-attacks to human error. Many turn to Amazon Web Services (AWS) to house their workloads in an environment that can withstand disruptions of any type or scale.
IT resilience hinges on developing strong architectural, technological, and operational management. Cloud environments require assembly, execution, and maintenance. Here are five best practices for organizations to build IT resilience:
1. Define your business continuity objectives
When migrating data and systems to AWS, providing continuous service and avoiding a disruption is likely your main goal. You’ll need to get specific about your objectives to define what continuity of service means for your organization.
Before designing and managing applications in AWS, decide what’s most important for your organization in an outage. Ask these fundamental questions:
- What problems are you trying to solve by moving workloads to AWS?
- Which workloads are mission critical?
- Which workloads would have a lower impact on business if they were unavailable for a longer period of time?
- What specific aspects of an application require specific levels of availability?
Once you’ve answered these questions, you can be strategic in the decisions you make about application availability, testing, and backups.
2. House workloads across multiple Availability Zones
One key tenet of IT resilience is geographically balanced infrastructure. That means that when there’s a physical outage in one location—but your workloads are stored across the country—end users won’t experience a service disruption.
To do this, you’ll need to set up that geographic distribution yourself. The default location when you move something to AWS is a single Availability Zone (AZ). You must designate that you want another version of the application to run in a different AZ, and then you can route traffic to both.
To help manage workloads across multiple AZs, use application load balancers (ALB). These load balancers direct incoming application traffic across AZs so that you can scale applications while shoring up resilience to physical damage.
3. Support Region routing using Amazon Route 53
Another component of geographic distribution within AWS are its Regions. Regions, like US-East-1 or US-West-1, make up AZs. If your organization operates cross-regionally and you want to increase your availability, you need to route geographically dispersed end users to the appropriate IP address.
Domain name systems (DNSs) like Amazon Route 53 help with Region routing. This service can route users to various AWS services and allows you to route users to non-AWS infrastructure and to check the health of those applications and their endpoints.
4. Establish an incidence management process
Building resilience means planning for the inevitability of an outage. Your organization needs to have operational processes prepared for any type of outage. Your IT team should practice recovery protocols in a controlled setting, which will require you to do the following three things before you test:
- Define team roles and responsibilities of the designated incident owners.
- Clarify the order of operations for the incident owners following the start of the outage.
- Document your procedure for future reference.
Running routine outage tests can help you strengthen your incidence management process. The best plan won’t help you recover any faster if no one is familiar with it when there’s a real outage.
5. Back up your data
Many people think of backups as only necessary in the event of failed infrastructure, but backups are also valuable for outages involving human error or cybersecurity breaches. If an employee opens a malicious email that leads to a ransomware attack, having a backup of the compromised data helps you avoid a hefty ransomware sum.
There are two great AWS services that can help your organization manage data backups:
- Amazon EBS snapshots: An AWS service that helps you take snapshots of Elastic Block Store (EBS) volumes.
- Amazon Simple Storage Service (Amazon S3): A native AWS service that provides object storage through a web service interface.
To maximize the durability of your backups, consider creating read-only backups. These backup copies are accessible to employees, but only business leaders or third-party backup providers can edit them, which means there’s no way your staff can accidentally delete or tamper with data.
AWS is a resilience tool, not a resilience solution
AWS services are designed with operational resiliency in mind—but you are responsible for the architecture and configuration. While operational resiliency can be complicated, AWS can make detailed processes easier to manage. Armed with the right AWS services, organizations can achieve optimal IT resilience and availability with thorough planning. Learn more by visiting the AWS Organizational Resiliency & Continuity Help Center.
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
You might also like: