AWS Storage Blog

Automating application-consistent Amazon EBS Snapshots for MySQL and PostgreSQL

MySQL and PostgreSQL are popular relational database management systems that many organizations use to power web applications, dynamic websites, and embedded systems. For customers self-hosting MySQL and PostgreSQL with AWS, they can use their choice of tools to manage the operating system, database software, patches, data replication, backup, and restoration. As customers back up their MySQL and PostgreSQL workloads as part of a custom workflow to fulfill data protection needs, they often spend significant time and manual effort managing the orchestration of backup workflows, exposing customers to human errors that can lead to missed snapshots and increased storage costs.

If you have Amazon Elastic Compute Cloud (Amazon EC2) instances running self-managed MySQL and PostgreSQL databases, you may want to consider creating Amazon Elastic Block Store (EBS) Snapshots in addition to any database-level logical or physical backups. In contrast to database backups, snapshot backups are point-in-time copies of your data, which can be used easily to recover an entire database or to migrate databases across AWS Regions and accounts. With Amazon Data Lifecycle Manager, a policy-based lifecycle management solution for EBS Snapshots, you can automate the creation, retention, and deletion of EBS Snapshots for your database workloads at regular intervals. Now with Amazon Data Lifecycle Manager’s added support for pre-script and post-script automation, you gain a simple method to streamline backup operations by building custom scripts into Amazon Data Lifecycle Manager policies to automate database backup actions, such as freeze and thaw I/O, before (pre-script) and after (post-script) the initiation of EBS Snapshots. We have also created templates that include pre-script and post-script commands for standard versions of MySQL and PostgreSQL, which you can use with Data Lifecycle Manager to simplify the automation of application-consistent snapshots.

In this post, we walk through the use of Amazon Data Lifecycle Manager and AWS Systems Manager to automate the creation of application-consistent EBS Snapshots for self-managed MySQL and PostgreSQL databases. The solution empowers you to create application-consistent snapshots of your databases with confidence. These snapshots serve as reliable backups that you can depend on for disaster recovery, data migration, or other critical operational needs.

Solution overview

Previously, we outlined how to Create application-consistent snapshots using Amazon Data Lifecycle Manager and custom scripts, including the necessary steps to create Amazon Data Lifecycle Manager policies that use AWS Systems Manager Agent (SSM Agent) to run custom scripts on your EC2 instances before and after EBS Snapshots are initialized. In this post, we build on those instructions to create application-consistent snapshots for MySQL and PostgreSQL databases. We outline how to automate pre-scripts to pause I/O and flush buffer to disk and post-scripts to thaw I/O, as shown in the following figure.

Architectural diagram for this feature and MySQL/PostgreSQL.

Prerequisites

You must install Systems Manager Agent on all instances for which you want to create application-consistent EBS Snapshots and make sure the agent is running. If you are using one of these Amazon Machine Images (AMIs) provided by AWS, then Systems Manager Agent has already been pre-installed. You must setup all EC2 instances with the relevant permissions so that Systems Manager can execute the Systems Manager document. You must also make sure the AWS Identity and Access Management (IAM) service role that is used for your Data Lifecycle Manager policy has the appropriate permissions to run the Systems Manager documents on the targeted EC2 instances. The easiest way to do this is to attach the AWSDataLifecycleManagerSSMFullAccess IAM policy to the IAM role. If you are using this role, then you must add the DLMScriptsAccess:true tag to any custom Systems Manager documents that you want to use with this feature.

Next, you must have code that freezes the database and flushes data to disk (pre-script), and then thaws the database (post-script) once snapshots have been initialized. We have provided template code to complete those steps for standard versions of MySQL and PostgreSQL, which you can build on for your specific database. Note that it is your responsibility to make sure that the code can perform the necessary actions on your database. If the code is invalid, the created snapshots will not be application consistent.

Walkthrough

To enable custom pre-script and post-script automation, complete the following steps:

1. Create a Systems Manager document (or use an existing SSM template for your application) that freezes I/O, flushes memory to disk, and then thaws I/O. The document needs to have the required fields in order for Data Lifecycle Manager to trigger actions.

2. Create an Amazon Data Lifecycle Manager policy. It is responsible for coordinating the execution of the Systems Manager document, initiating the snapshot, marking the snapshots as application-consistent, as well as managing its retention and other actions.

3. Validate that the snapshots created are application consistent.

Step 1: Create an AWS Systems Manager document

1. Navigate to the Systems Manager console and select Documents in the navigation pane.

Screenshot showing AWS Systems Manager console

2. Select the Create document drop-down box, followed by Command or Session.

Screenshot showing the selection of Command or Session to create SSM document

3. Fill in the Document details and make sure that Document type is set to Command. In this example, the document name is MySQL-EBS-Snapshots. Remember this, as you must use the same document name later when creating the policy.

Screenshot showing Document details.

4. Paste your pre-script and post-script code for your database into the Content section under YAML.

Screenshot showing Content section with code pasted.

When adding code, you must make sure that the required fields are present. Amazon Data Lifecycle Manager relies on these fields to correctly initialize the pre-script and post-script. Without it, Data Lifecycle Manager cannot create application-consistent EBS Snapshots.

We recommend that you start by modifying the provided Systems Manager Command document template for MySQL and PostgreSQL, rather than creating your own documents from scratch. The pre-script and post-script portions of the template for MySQL are further outlined in the following section.

Note that it is your responsibility to make sure that the code can perform the necessary actions on your database. If the code is not valid, then you cannot end up with application-consistent EBS Snapshots.

If you are planning to exclude the root volume and/or non-root volumes when creating the set of application-consistent snapshots, then make sure the code you provide performs the necessary steps on the appropriate set of EBS volumes.

The following is a sample pre-script to freeze I/O. We have also included “Auto thaw” (set to 60 seconds in the template) as a fail-safe mechanism to unfreeze the database. The script also does not freeze the root/boot filesystem. We recommend that you use only data (non-root) volumes for your database.

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# Permission is hereby granted, free of charge, to any person obtaining a copy of this
# software and associated documentation files (the "Software"), to deal in the Software
# without restriction, including without limitation the rights to use, copy, modify,
# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# Auto thaw is a fail safe mechanism to automatically unfreeze the application after the # duration specified in the global variable below. Choose the duration based on your # database application's tolerance to freeze. export AUTO_THAW_DURATION_SECS="60"

# Add all pre-script actions to be performed within the function below execute_pre_script() {
echo "INFO: Start execution of pre-script"
# Check if filesystem is already frozen. No error code indicates that filesystem # is not currently frozen and that the pre-script can proceed with freezing the filesystem. check_fs_freeze
# Execute the DB commands to flush the DB in preparation for snapshot snap_db
# Freeze the filesystem. No error code indicates that filesystem was succefully frozen freeze_fs

echo "INFO: Schedule Auto Thaw to execute in ${AUTO_THAW_DURATION_SECS} seconds."
$(nohup bash -c execute_schedule_auto_thaw >/dev/null 2>&1 &)
}

# Iterate over all the mountpoints and freeze the filesystem. freeze_fs() {
for target in $(lsblk -nlo MOUNTPOINTS)
do
# Freeze of the root and boot filesystems is dangerous. Hence, skip filesystem freeze # operations for root and boot mountpoints. if [ $target == '/' ]; then continue; fi
if [[ "$target" == *"/boot"* ]]; then continue; fi
echo "INFO: Freezing $target"
error_message=$(sudo fsfreeze -f $target 2>&1)
if [ $? -ne 0 ];then
# If the filesystem is already in frozen, return error code 204 if [[ "$error_message" == *"$FS_ALREADY_FROZEN_ERROR"* ]]; then
echo "ERROR: Filesystem ${target} already frozen. Return Error Code: 204"
sudo mysql -e 'UNLOCK TABLES;'
exit 204
fi
# If the filesystem freeze failed due to any reason other than the filesystem already frozen, return 201 echo "ERROR: Failed to freeze mountpoint $targetdue due to error - $errormessage"
thaw_db
exit 201
fi
echo "INFO: Freezing complete on $target"
done
}

The following is a sample post-script to unfreeze I/O and disable auto-thaw:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# Permission is hereby granted, free of charge, to any person obtaining a copy of this
# software and associated documentation files (the "Software"), to deal in the Software
# without restriction, including without limitation the rights to use, copy, modify,
# merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
# PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# Add all post-script actions to be performed within the function below execute_post_script() {
echo "INFO: Start execution of post-script"
# Unfreeze the filesystem. No error code indicates that filesystem was successfully unfrozen. unfreeze_fs
thaw_db
} 

# Iterate over all the mountpoints and unfreeze the filesystem. unfreeze_fs() {
for target in $(lsblk -nlo MOUNTPOINTS)
do
# Freeze of the root and boot filesystems is dangerous and pre-script does not freeze these filesystems. # Hence, will skip the root and boot mountpoints during unfreeze as well. if [ $target == '/' ]; then continue; fi
if [[ "$target" == *"/boot"* ]]; then continue; fi
echo "INFO: Thawing $target"
error_message=$(sudo fsfreeze -u $target 2>&1)
# Check if filesystem is already unfrozen (thawed). Return error code 204 if filesystem is already unfrozen. if [ $? -ne 0 ]; then
if [[ "$error_message" == *"$FS_ALREADY_THAWED_ERROR"* ]]; then
echo "ERROR: Filesystem ${target} is already in thaw state. Return Error Code: 205"
exit 205
fi
# If the filesystem unfreeze failed due to any reason other than the filesystem already unfrozen, return 202 echo "ERROR: Failed to unfreeze mountpoint $targetdue due to error - $errormessage"
exit 202
fi
echo "INFO: Thaw complete on $target"
done 
}

5. Then, add the tag (Key = DLMScriptsAccess, Value = true) to this document in order for the policy to be able to run it through the Systems Manager Agent (using the default IAM role). Add other tags to the Systems Manager document as needed, and then select Create document.

Screenshot showing Tags added to SSM document.

Step 2: Create an Amazon Data Lifecycle Manager policy

Now we create Amazon Data Lifecycle Manager policies to automate the creation and management of EBS Snapshots that are initiated in between the pre-scripts and post-scripts. The following outline steps are required when creating the policy through the Amazon EC2 console. However, you can also create the policy by using API/CLI and AWS CloudFormation.

If you already have policies creating crash-consistent snapshots, then you can modify those policies and enable the Pre/Post script feature. As long as all the other pre-requisites have been met, your policies will start creating application-consistent EBS Snapshots the next time it runs.

1. To get started, launch the EC2 console, then select Lifecycle Manager under Elastic Block Store in the left-side navigation panel. Under Schedule-based policy, select EBS snapshot policy.

2. In Target resource types, select Instance and then supply tags for all instances that you want to target. In this example, we will target all instances with the tag (MySQL:true). Add a description for the policy.

Screenshot showing Specify settings for creating lifecycle policy

3. For IAM role, most customers should select Default role as this will have all the permissions required for the policy actions. When creating/modifying policies through console, AWSDataLifecycleManagerSSMFullAccess IAM policy (which has all the permissions for this feature) will automatically be attached to the Default role. If you are using API/CLI to create/modify policies for this feature, then you will need to manually attach the IAM policy to the Default role. If you choose to use a Custom IAM role, then you will need to make sure the IAM role has all the required permissions to run SSM documents on targeted instances.

Screenshot showing Specify settings for creating lifecycle policy

4. On the next page, set up your policy creation schedule. In this example, we are creating snapshots every 24 hours at 11:00 UTC and retaining them for 7 days.

Screenshot showing the setup of policy creation schedule.

5. Under Advanced Settings, make sure you check the box to Enable pre and post scripts for this schedule. Next, select the tile labeled Custom SSM document and the radio button for Pre and post scripts under Automation option.

Screenshot showing Custom SSM document tile selected.

6. Under Systems Manager document, type in the name of the Systems Manager Command document that you created in Step 1 (“MySQL-EBS-Snapshots”). You can also set additional parameters here, such as the Script timeout period and enable Retry script if it fails.

The Script timeout period is the amount of time that Amazon Data Lifecycle Manager waits for successful completion of the script. If the time is exceeded and Data Lifecycle Manager has not received confirmation of successful completion, then your policy treats the script as having failed.

You can set Retry script if it fails to automatically retry initiating the failed script. You should consider this if you want a higher likelihood of your script completing successfully and if your database can withstand being quiesced repeatedly in a short amount of time.

We recommend that you also enable Default to crash-consistent snapshots if script fails. If enabled, then Amazon Data Lifecycle Manager attempts to create crash-consistent snapshots if it cannot successfully run your pre-script. You can use the tags applied to the snapshots as well as Amazon EventBridge to later determine if the EBS Snapshots were created as part of successful executions of the pre-script and post-script in your Systems Manager document.

Screenshot showing additional SSM document parameters

7. Under Advanced Settings, you can also set the policy to automate other actions such as Cross-Region copy and Cross-account sharing. In this example, we are setting the policy to make sure the most recent set of application-consistent EBS Snapshots for each EC2 instance has Fast Snapshot Restore enabled in us-east-1a. Therefore, volumes created from those snapshots instantly deliver all of their provisioned performance.

Screenshot showing Fast Snapshot Restore enabled.

Step 3: Validate that the snapshots created are application consistent

Once your Amazon Data Lifecycle Manager policy has created an EBS Snapshot, you can check if it is application-consistent.

1. Navigate to the Amazon EC2 console and select Snapshots.

Screenshot showing the selection of Snapshots on EC2 console

2. Select the snapshot and select Tags in the bottom panel. If you see a tag for ‘aws:dlm:pre-script: SUCCESS’, then the snapshot was created following successful execution of the pre-script. If you see a tag for ‘aws:dlm:post-script: SUCCESS’, then the post-script was also successfully completed. If you see SUCCESS for both tags and your Systems Manager document has the correct instructions to quiesce disk, flush data to memory, and thaw disk, then the snapshots you have created are application consistent.

Screenshot showing snapshots with Tags.

Cleaning up

Clean up the snapshots created during the previous steps to make sure you do not incur storage charges. You can do so by navigating to the Snapshots screen, searching for all snapshots created by the policy, selecting all the snapshots, and then selecting Actions followed by Delete snapshot.

Similarly, delete the Data Lifecycle Manager policy to ensure no future snapshots are created by the policy. You can do so by navigating to the Lifecycle Manager screen, selecting the policy, and then selecting Actions followed by Delete lifecycle policy.

Conclusion

In this post, we went through how to automate the creation and retention of application-consistent EBS Snapshots. We hope this reduces the amount of time and effort required to enhance the data protection of your self-managed databases running on EC2 instances.

With Amazon Data Lifecycle Manager, you can exclude the root/boot volume when creating a set of application-consistent snapshots. You can also exclude non-boot (data) volumes, which is useful if you want to save on costs by not creating backups of volumes that are only used to store log or temporary data. You can also set your policy to manage Fast Snapshot Restore on the most recent set of snapshots so that you can create new EBS volumes that deliver maximum performance without needing to be initialized. Furthermore, you can automatically share the snapshots with different accounts and copy snapshots to different AWS Regions. Moreover, Amazon Data Lifecycle Manager policies are free to create, which saves you from having to use third-party tools or develop and maintain complex custom scripts.

As a final takeaway, we encourage you to try this in your own environment. You can also learn more about this feature by reading our technical documentation and exploring different use cases for using pre- and post-scripts with Amazon Data Lifecycle Manager.

We welcome your feedback. If you have questions or suggestions, leave them in the comments section.

Chakrapani Ramasundaram

Chakrapani Ramasundaram

Chakrapani Ramasundaram is a Software Development Engineer for Amazon Elastic Block Store (Amazon EBS). He is a problem solver at heart and loves to identify and resolve customer pain points. Chakrapani has over 10 years of experience designing and building large scale systems from ideation to commercialization.

Arnab Saha

Arnab Saha

Arnab Saha is a Senior Database Specialist Solutions Architect at Amazon Web Services. Arnab specializes in Amazon RDS, Amazon Aurora , AWS DMS and Amazon Elastic Block Store. He provides guidance and technical assistance to customers for building scalable, highly available and secure solutions in AWS Cloud.

Vivek Singh

Vivek Singh

Vivek Singh is a Principal Database Specialist Technical Account Manager with AWS focusing on RDS/Aurora PostgreSQL engines. He works with enterprise customers by providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 17 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

Tom McDonald

Tom McDonald

Tom McDonald is a Senior Workload Storage Specialist at AWS. Starting with an Atari 400 and re-programming tapes, Tom began a long interest in increasing performance on any storage service. With 20 years of experience in the Upstream Energy domain, file systems and High-Performance Computing, Tom is passionate about enabling others through community and guidance.