AWS Cloud Operations Blog
Automate RDS Aurora Snapshots for disaster recovery
It is important to have a well-defined proactive disaster recovery strategy for efficient and uninterrupted flow of data across an organization. This applies to all components of your application architecture, including the database layer. While Amazon Aurora database clusters are fault-tolerant and highly available by design, for disaster recovery use cases, customers prefer to keep a snapshot of their Aurora database clusters in an AWS Region different from the primary Region.
In this blog post, we demonstrate how you can leverage AWS Systems Manager to create encrypted snapshots of Amazon RDS Aurora (MySQL or PostgreSQL) clusters. Furthermore, we will use AWS Systems Manager to copy those snapshots to a different AWS Region, for disaster recovery purposes.
Walkthrough
The solution takes advantage of AWS System Manager Automation feature to build a three-step automation workflow, as shown in the following diagram:
- Create an Aurora database cluster snapshot using Automation’s aws:executeAwsApi capability and invoking the CreateDBClusterSnapshot API.
- Wait for the snapshot to complete, using Automation’s aws:waitForAwsResourceProperty capability and invoking the DescribeDBClusterSnapshots API.
- Initiate a snapshot copy to target a selected Region using the aws:executeScript action, which uses the CopyDBClusterSnapshot API in the Python script.
We have provided an AWS CloudFormation template that deploys this solution in your AWS account. The template takes four parameters – DBClusterIdentifier (source DB cluster ID), KMSTargetKey (KMS key ID in the target region), SourceRegion (region where DB cluster is located) and TargetRegion (destination region for the snapshot). Please ensure that the SourceRegion and TargetRegion parameter inputs are specified in lower case (such as us-east-1 and us-west-2). In addition, also ensure that you execute this template in the Region where your Aurora database cluster resides. The template appends an Aurora database cluster identifier in the name of the resources it creates. Hence, you can deploy this template individually for each of your Aurora clusters. See Creating a Stack on the AWS CloudFormation Console for more information on creating an AWS CloudFormation stack.
The AWS CloudFormation deploys the following AWS Systems Manager automation document:
description: Aurora RDS Cluster Snapshot and Copy Automation Document
schemaVersion: '0.3'
assumeRole: '<AssumedRoleARN>'
mainSteps:
- name: CreateSnapshot
action: 'aws:executeAwsApi'
inputs:
Service: rds
Api: CreateDBClusterSnapshot
DBClusterSnapshotIdentifier: '<CLUSTER IDENTIFIER>-db-snapshot-{{automation:EXECUTION_ID}}'
DBClusterIdentifier: <Aurora Cluster Identifier>
outputs:
- Name: SnapShotId
Selector: $.DBClusterSnapshot.DBClusterSnapshotIdentifier
Type: String
- Name: DBClusterId
Selector: $.DBClusterSnapshot.DBClusterIdentifier
Type: String
- Name: DBClusterSnapshotArn
Selector: $.DBClusterSnapshot.DBClusterSnapshotArn
Type: String
- name: waitForSnapshotCompletion
action: 'aws:waitForAwsResourceProperty'
inputs:
Service: rds
Api: DescribeDBClusterSnapshots
DBClusterSnapshotIdentifier: '<CLUSTER IDENTIFIER>-db-snapshot-{{automation:EXECUTION_ID}}'
DBClusterIdentifier: <CLUSTER IDENTIFIER>
PropertySelector: '$.DBClusterSnapshots[0].Status'
DesiredValues:
- available
- name: ExecuteCode
action: 'aws:executeScript'
inputs:
Runtime: python3.7
Handler: script_handler
InputPayload:
snapshotid: '{{CreateSnapshot.SnapShotId}}'
snapshotarn: '{{CreateSnapshot.DBClusterSnapshotArn}}'
dbclusterid: '{{CreateSnapshot.DBClusterId}}'
automationid: '{{automation:EXECUTION_ID}}'
sourceregion: !Ref SourceRegion
targetregion: !Ref TargetRegion
kmstargetkey: !Ref KMSTargetKey
Scri<pre><code class="lang-yaml">pt: |- def script_handler(event, context): import boto3, json, os # Input parameters are provided by SSM document snapshotid = event.get("snapshotid") snapshotarn = event.get("snapshotarn") dbclusterid = event.get("dbclusterid") sourceregion = event.get("sourceregion") targetregion = event.get("targetregion") kmstargetkey = event.get("kmstargetkey") # Define Target region in the region_name.Following API # is expected to run in Target region. Hence, by setting region_name # to Target region, we achive that. client = boto3.client('rds', region_name=targetregion) response = client.copy_db_cluster_snapshot( SourceDBClusterSnapshotIdentifier=snapshotarn, TargetDBClusterSnapshotIdentifier=snapshotid, KmsKeyId=kmstargetkey, # KMS Key ID in Target region CopyTags=True, SourceRegion=sourceregion # This attribute will automatically generate presigned URL ) print(response) copystatus = response.get("DBClusterSnapshot").get("Status") print("Status of Copying of Snapshot:" + str(copystatus))
Copying the snapshot to the target Region using the CopyDBClusterSnapshot API requires generation of a PreSignedURL. You can use the aws:executeScript action to execute a Python script, which invokes this API. The script uses the AWS SDK for Python, which automatically generates the PreSignedUrl once you provide the SourceRegion attribute. The Amazon RDS client in the script is initialized in the snapshot target AWS Region. Please note that the script executes in the source region where the Aurora database cluster exists.
Conclusion
This blog post presents a solution for implementing disaster recovery for Aurora database clusters by automating the process of cluster snapshot creation and copying to different AWS Regions. Based on your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements, this process can be triggered using either AWS Systems Manager Maintenance Windows or an Amazon CloudWatch event rule, which uses an Automation document as a target.
About the Authors
Kapil Shardha is an AWS Solutions Architect and supports enterprise customers with their AWS adoption. He has background in infrastructure automation and DevOps.
William Torrealba is an AWS Serverless Specialist Solutions Architect supporting customers with their AWS adoption specially in the usage of Serverless Technologies. He has background in Application Development, Serverless Technologies, High Available Distributed Systems, Automation, and DevOps.