Build a notification mechanism to manage Amazon RDS manual snapshots
It’s no secret that data is an essential part of running a business, no matter how large or small a business may be. Many companies host their business data using relational databases. As a result, backup and recovery are important aspects of keeping the business running. Amazon RDS customers use a mixture of strategies to back up their data with both automated and manual snapshots. You might use Amazon RDS manual snapshots for longevity since automated snapshots are deleted when a database is deleted. You might also use manual snapshots for disaster recovery with the capabilities for cross-account and cross-region sharing. Amazon RDS caters to a wide variety of backup needs with the ability to recover an entire database from a single snapshot.
In this post, I show how to build a serverless notification mechanism to manage Amazon RDS manual snapshots for both RDS instances and Aurora clusters. The key activities performed are:
- Given the list of RDS instances and Aurora clusters, manual snapshots are created at the defined backup interval
- Based on the backup retention period, older manual snapshots are deleted
- Each time, at the end of this activity, subscribers are notified with a list of newly created manual snapshots and older deleted snapshots, if any.
Serverless solution architecture
This serverless solution is packed into an AWS CloudFormation script. The script takes various inputs such as user-specified snapshot-backup interval, backup retention period, list of RDS instance names, and email addresses for notification. An Amazon CloudWatch Events rule triggers the AWS Step Functions state machine on a schedule. The state machine creates the manual snapshot, deletes old snapshots, and finally sends a notification to Amazon SNS. SNS sends email to the user-provided email address.
The solution to manage Amazon RDS manual snapshots (RDS instances and Aurora clusters) is performed with the following AWS services:
- AWS Step Functions: Orchestrates and makes it easy to coordinate individual components that each perform a discrete function.
- AWS Lambda: Provides functions to implement the task states. The Lambda functions are implemented in Python.
- Amazon DynamoDB: Stores snapshot information for management of snapshots and notifications.
- Amazon SNS: Provides a flexible, fully managed pub/sub messaging service for coordinating the delivery of snapshot notifications to subscribing endpoints.
- Amazon CloudWatch Events: Used to trigger Step Functions on an automated schedule.
This solution has few limitations:
- The solution is dependent on AWS Step Functions, so this solution is available only in AWS Regions where AWS Step Functions are supported.
- The solution is not built to work with encrypted manual snapshots.
The following diagram illustrates the architecture. Here are the key activities shown:
- A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
- The state machine invokes the first Lambda function to create snapshots.
- The create snapshots Lambda function initiates the creation of a manual snapshot of all RDS instances including Aurora clusters if provided. The same Lambda function populates DynamoDB table with snapshot information.
- The state machine executes the two Lambda functions in parallel: Delete old and check snapshots.
- The delete old Lambda function retrieves information from DynamoDB and deletes snapshots based on the backup retention period.
- Within the Step Functions state machine, the check snapshots Lambda function checks for the status for completion. If any RDS instances remain for a snapshot to be completed, the Lambda function raises an exception. The state machine captures this exception error and performs a retry after a certain period.
- The fourth Lambda function, email notification, checks for the status on DynamoDB for snapshot information. This Lambda function publishes information to two SNS topics:
- RDSBackupInfo topic informs about snapshots newly created and old snapshots deleted in the scheduled run.
- Backups_Failed_RDS topic informs of any failed backups in the scheduled run.
- Finally, both SNS topics send the notification to end subscribers to an email address provided.
To understand the manual snapshot processing workflow, let’s take a closer look at the actions of the Step Functions state machine.
Prerequisites before implementation
- Amazon RDS: An easy-to-use scalable relational database in the cloud. You need at least one RDS instance in the same region where the AWS CloudFormation stack is launched.
- A text file named rds_backup_list.txt delimited by \n with the list of RDS instance names. For example, testinstance would be the name on the file as shown in the following screenshot.
- If the rds_backup_list.txt file is missing, the state machine fails and throws an exception.
Implement Amazon RDS manual snapshots with notifications
The procedure for deploying this architecture on AWS consists of the following steps.
Step 1 – Download CloudFormation script (YAML file) and Lambda functions (listed below)
Step 2 – Upload Lambda functions into an S3 Bucket (CodeBucket in Step 4)
- Upload Lambda functions (zip files) into an S3 bucket of your choice as shown in the following screenshot.
- Lambda functions in an S3 bucket should be in the same Region where the CloudFormation script is executed.
Step 3 – Execute the CloudFormation script
- Upload AWS CloudFormation template in the region of your choice.
- Provide a unique stack name as shown in the following screenshot
- Lambda functions in an S3 bucket should be in the same region where the CloudFormation script is executed.
Step 4 – Launch the stack
- Launch the AWS CloudFormation template in your AWS account.
- Enter Input parameter values for the stack as shown in the following screenshot.
- Backup Interval – Interval for backups in hours. The default is set to 24 hrs.
- Backup Schedule – To be provided in CloudWatch Event Cron format. Run at least once for every interval. The default value runs once every at 1:00 AM UTC. For more information, refer to CloudWatch schedule events expression – AWS documentation.
- CodeBucket – Name of the bucket that contains the Lambda functions that were uploaded in Step 2, to deploy.
- DeleteOldSnapshots – Can choose either TRUE or FALSE. Set to TRUE to enable deletion of snapshots based on RetentionDays. Set to FALSE to disable deletion, thus historical information is available in DynamoDB.
- LogLevel – Log level for Lambda functions. Valid values are one of DEBUG, INFO, WARN, ERROR, CRITICAL.
- NotifyEmail – Email address required to send notifications. Confirm the subscription sent by SNS is be able to receive email notifications.
- Retention Days – Value is in days, to keep snapshots before deleting them. The default is seven days.
Step 5 – Options and review for CloudFormation
- Choose Next for Options screen on CloudFormation.
- On the Review page, let AWS handle the creation of an IAM role based on the components created.
- Check the box for I acknowledge that AWS CloudFormation might create IAM resources with custom names.
- Choose Next to execute the CloudFormation script.
Step 6 – Run the CloudFormation script
- Wait until the stack creation has status CREATE_COMPLETE.
- Check the Outputs tab, shown in the following screenshot, for:
- S3SourceListBucketOutput – A new S3 bucket is created where the rds_backup_list.txt file can be uploaded.
- BackupFailedTopic – An SNS topic to receive alerts of failed backups.
- EmailNotificationTopic – An SNS topic to receive notifications of newly created snapshots and deleted snapshots.
Step 7 – Upload rds_backup_list.txt
- The rds_backup_list.txt file contains the list of RDS DB instance or Aurora cluster names to be backed-up. For example, blogtest RDS instance is shown in the following screenshot.
- Each DB instance name appears on a separate line of the text file. A screenshot of an example .txt file follows.
- Upload this file in the newly created S3 bucket available as Output from CloudFormation script as S3SourceListBucketOutput. The result is shown in the following screenshot.
Step 8 – Test the tool
- Go to the Step Function console and choose the state machine created by the CloudFormation script. An example is shown in the following screenshot.
- Choose New Execution to test the tool.
- Provide an Execution name. For example, BlogTest1, and choose Start Execution to perform a test.
- See the visual workflow where various functions of the state machine are triggered. An example of this is shown in the following screenshot.
- You can see the first set of snapshots are created for the databases listed in the rds_backup_list.txt file.
- Optionally, check the CloudWatch Rules to see the Backup Schedule created as defined in Step 4. You can also review the email notification that SNS sends when the steps are completed. An example email is shown here.
Looking back at our serverless snapshot solution with notifications
Losing business data can be catastrophic in terms of both time and money. Apart from having a database backup strategy, having a monitoring system is a step in the right direction toward preserving critical business data. Building this serverless RDS manual snapshot solution means that you’ll have one less item to worry about regarding your database backup strategy.
About the Author
Suman Koduri is a senior technical account manager at Amazon Web Services. He works with Enterprise Support customers, and provides technical guidance and assistance to help them make the best use of the AWS platform. In his spare time, he loves running half marathon’s and riding his motorcycle.