AWS Storage Blog

Automating application-consistent Amazon EBS Snapshots for InterSystems IRIS databases

InterSystems, an AWS Partner Network (APN) partner, provides a cloud-based data platform optimized for high-throughput applications that must simultaneously process transactions and a range of analytics, including analytic SQL, business rules, and machine learning. Users use the InterSystems IRIS Data Platform to rapidly develop and deploy critical applications. InterSystems recommends that users consider several backup methods, such as external backups, which is the recommended best practice for backing up the entire database. This is an essential part of creating a solid backup strategy for applications running on those database platforms.

Users host higher-level InterSystems services that run on top of InterSystems IRIS for Health on AWS, such as InterSystems Health Connect, InterSystems FHIR Repository, InterSystems HealthShare Unified Care Record, InterSystems Patient Index, InterSystems Provider Directory, or InterSystems IRIS, on Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS). As users back up their IRIS databases as part of a custom workflow to fulfill data protection needs, they often spend significant time and manual effort managing the orchestration of backup workflows, exposing users to human errors that can lead to missed snapshots and increased storage costs.

In this post, we outline how users can create external backups of IRIS running on EC2 instances in the form of application-consistent Amazon EBS Snapshots. These snapshots are initialized after the database has been frozen and writes to the database have been paused. You learn how to automate the creation, retention, and management of snapshots by creating simple Amazon Data Lifecycle Manager policies and making use of its support for pre-script and post-script automation. We have also created templates that include pre-script and post-script commands for standard versions of InterSystems IRIS databases, which you can use with Amazon Data Lifecycle Manager to simplify automation of application-consistent snapshots.

Solution overview

Previously, we outlined how to create application-consistent snapshots using Amazon Data Lifecycle Manager and custom scripts, including the necessary steps to create Amazon Data Lifecycle Manager policies that use AWS Systems Manager Agent to run custom scripts on your EC2 instances before and after EBS Snapshots are initialized. In this post, we build on those instructions to create application-consistent snapshots for InterSystems IRIS databases. We outline how to automate pre-scripts to pause I/O and flush buffer to disk and post-scripts to thaw I/O, as shown in the following figure:

Architecture diagram for automating application-consistent EBS Snapshots for InterSystems IRIS databases.

Prerequisites

You must install Systems Manager Agent on all instances for which you want to create application-consistent EBS Snapshots and make sure the agent is running. If you are using one of these Amazon Machine Images (AMIs) provided by AWS, then Systems Manager Agent has already been preinstalled. You must setup all EC2 instances with the relevant permissions so that AWS Systems Manager can execute the Systems Manager document. You must also make sure the AWS Identity and Access Management (IAM) service role that is used for your Amazon Data Lifecycle Manager policy has the appropriate permissions to run the Systems Manager documents on the targeted EC2 instances. The easiest way to do this is to attach the AWSDataLifecycleManagerSSMFullAccess IAM policy to the IAM role. If you are using this role, then you must add the DLMScriptsAccess:true tag to any custom Systems Manager documents that you want to use with this feature.

Next, you must have code that freezes the IRIS database (by using IRIS’ ##Class(Backup.General).ExternalFreeze() API class method) and flushes data to disk (pre-script), and then thaws the database (by using ##Class(Backup.General).ExternalThaw() API class method) once snapshots have been initialized. You can either use the provided templates, or create custom AWS Systems Manager documents (SSM documents) that meet the requirements. Note that it is your responsibility to make sure the code can perform the necessary actions on your IRIS database. If the code is invalid, then the created snapshots are not application-consistent.

Walkthrough

To enable custom pre-script and post-script automation, complete the following steps:

1. Create an AWS Systems Manager document that freezes I/O, flushes memory to disk, and then thaws I/O for your IRIS database. The document needs to have the necessary fields in order for Amazon Data Lifecycle Manager to trigger actions.

2. Create an Amazon Data Lifecycle Manager policy. The policy is responsible for coordinating the execution of the Systems Manager document, initiating the snapshot, marking the snapshots as application consistent, and managing its retention as well as other actions.

3. Validate that the snapshots created are application consistent.

Step 1: Create an AWS Systems Manager document

1. Navigate to the Systems Manager console and select Documents in the navigation pane.

Screenshot showing Systems Manager console

2. Select the Create document drop down box followed by Command or Session.

Screenshot showing Create SSM document interface

3. Fill in the Document details and make sure that Document type is set to Command. In this example, the document name is InterSystems_IRIS_Snapshots. Remember this, as you must use the same document name later when creating the policy.

Screenshot for setting SSM document parameters

4. Paste your pre-script and post-script code for your database into the Content section under YAML. You can use either the provided template for InterSystems IRIS, or add custom code.

Screenshot showing the creation of SSM document

When adding code, you must make sure that the necessary fields are present. Amazon Data Lifecycle Manager relies on these fields to correctly initialize the pre-script and post-script. Without it, Amazon Data Lifecycle Manager cannot create application-consistent EBS Snapshots.

We recommend that you start by modifying the provided Systems Manager Command document template for InterSystems IRIS, rather than creating your own documents from scratch. The pre-script and post-script portions of the template for InterSystems IRIS are further outlined in the following section.

Note that it is your responsibility to make sure that the code can perform the necessary actions on your database. If the code is not valid, then you cannot end up with application-consistent EBS Snapshots.

If you are planning to exclude the root volume and/or non-root volumes when creating the set of application-consistent snapshots, then make sure the code you provide performs the necessary steps on the appropriate set of EBS volumes.

The following is a sample pre-script to freeze I/O for InterSystems IRIS, which you can find in the template. As part of the ExternalFreeze command, we have set the ExternalFreezeTimeOut to 600 seconds (10 minutes) and the WDSuspendLimit to 300 seconds (5 minutes).

###===============================================================================###
# MIT License
# 
# Copyright (c) 2024 InterSystems
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
###===============================================================================###
# Add all pre-script actions to be performed within the function below
execute_pre_script() {
echo "INFO: Start execution of pre-script"

# find all InterSystems IRIS instances running on an EC2 Instance
iris_instances=$(docker exec $DOCKER_NAME iris qall 2>/dev/null | tail -n +3 | grep '^up' | cut -c5- | awk '{print $1}')
echo "`date`: Running iris instances $iris_instances"

# only for running InterSystems IRIS instances
for INST in $iris_instances; do

echo "`date`: Attempting to freeze $INST"

# Detailed instances specific log
LOGFILE=$LOGDIR/$INST-pre_post.log

#check Freeze status before starting
docker exec $DOCKER_NAME irissession $INST -U '%SYS' "##Class(Backup.General).IsWDSuspendedExt()"
freeze_status=$?
if [ $freeze_status -eq 5 ]; then
echo "`date`: ERROR: $INST IS already FROZEN"
EXIT_CODE=204
else
echo "`date`: $INST is not frozen"
# Freeze
# Docs: https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=Backup.General#ExternalFreeze
$DOCKER_EXEC irissession $INST -U '%SYS' "##Class(Backup.General).ExternalFreeze(\"$LOGFILE\",,,,,,600,,,300)"
status=$?

case $status in
5) echo "`date`: $INST IS FROZEN"
;;
3) echo "`date`: $INST FREEZE FAILED"
EXIT_CODE=201
;;
*) echo "`date`: ERROR: Unknown status code: $status"
EXIT_CODE=201
;;
esac
echo "`date`: Completed freeze of $INST"
fi
done
echo "`date`: Pre freeze script finished"

The following is a sample post-script to unfreeze I/O and disable auto-thaw for InterSystems IRIS, which you can find in the template:

###===============================================================================###
# MIT License
# 
# Copyright (c) 2024 InterSystems
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
###===============================================================================###
# Add all post-script actions to be performed within the function below
execute_post_script() {
echo "INFO: Start execution of post-script"

# find all InterSystems IRIS instances running on an EC2 Instance
iris_instances=$(docker exec $DOCKER_NAME iris qall 2>/dev/null | tail -n +3 | grep '^up' | cut -c5- | awk '{print $1}')
echo "`date`: Running iris instances $iris_instances"

# only for running InterSystems IRIS instances
for INST in $iris_instances; do

echo "`date`: Attempting to thaw $INST"

# Detailed instances specific log
LOGFILE=$LOGDIR/$INST-pre_post.log

#check Freeze status befor starting
$DOCKER_EXEC irissession $INST -U '%SYS' "##Class(Backup.General).IsWDSuspendedExt()"
freeze_status=$?
if [ $freeze_status -eq 5 ]; then
echo "`date`: $INST is in frozen state"
# Thaw
# Docs: https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?LIBRARY=%25SYS&CLASSNAME=Backup.General#ExternalFreeze
$DOCKER_EXEC irissession $INST -U%SYS "##Class(Backup.General).ExternalThaw(\"$LOGFILE\")"
status=$?

case $status in
5) echo "`date`: $INST IS THAWED"
$DOCKER_EXEC irissession $INST -U%SYS "##Class(Backup.General).ExternalSetHistory(\"$LOGFILE\")"
;;
3) echo "`date`: $INST THAW FAILED"
EXIT_CODE=202
;;
*) echo "`date`: ERROR: Unknown status code: $status"
EXIT_CODE=202
;;
esac
echo "`date`: Completed thaw of $INST"

else
echo "`date`: ERROR: $INST IS already THAWED"
EXIT_CODE=205
fi
done
echo "`date`: Post thaw script finished"
}

5. Then, add the tag (Key = DLMScriptsAccess, Value = true) to this document in order for the policy to be able to run it through Systems Manager Agent (using the default IAM role). Add other tags to the Systems Manager document as needed, and then select Create document.

Step 2: Create an Amazon Data Lifecycle Manager policy

Now we create Amazon Data Lifecycle Manager policies to automate the creation and management of EBS Snapshots that are initiated in between the pre-scripts and post-scripts. The following outline steps are needed when creating the policy through the Amazon EC2 console. However, you can also create the policy by using API/CLI and AWS CloudFormation.

If you already have policies creating crash-consistent snapshots, then you can modify those policies and enable the Pre/Post script feature. As long as all the other prerequisites have been met, your policies should start creating application-consistent EBS Snapshots the next time it runs.

1. To get started, launch the Amazon EC2 console, then select Lifecycle Manager under Elastic Block Store in the left side navigation panel. Under Schedule-based policy, select EBS snapshot policy.

2. In Target resource types, select Instance and then input tags of all the instances that you want to target. In this example, we target all instances with the tag (InterSystems: true). Add a description for the policy.

Screenshot showing the creation of DLM policy

3. For the IAM role, most users should select Default role, as this has all the permissions needed for the policy actions. When creating/modifying policies through the console, the AWSDataLifecycleManagerSSMFullAccess IAM policy (which has all the permissions for this feature) is automatically attached to the Default role. If you are using API/CLI to create/modify policies for this feature, then you must manually attach the IAM policy the Default role. If you choose to use a Custom IAM role, then you must make sure the IAM role has all the needed permissions to run SSM Documents on targeted instances.

Screenshot showing IAM role settings

4. On the next page, setup your policy creation schedule. In this example, we are creating snapshots every 24 hours at 11:00 UTC and retaining for 7 days.

Screenshot for policy schedule settings

5. Under Advanced Settings, make sure you check the box to Enable pre and post scripts for this schedule. Next, select the tile labeled Custom SSM document and the radio button for Pre and post scripts under Automation option.

Screenshot for the selection of Custom SSM document tile

6. Under Systems Manager document, type in the name of the Systems Manager Command document that you created in Step 1 (“InterSystems_IRIS_Snapshots”). You can also set additional parameters here such as the Script timeout period and enable Retry script if it fails.

The Script timeout period is the amount of time that Amazon Data Lifecycle Manager waits for successful completion of the script. If the time is exceeded and Amazon Data Lifecycle Manager has not received confirmation of successful completion, then your policy treats the script as having failed.

You can set Retry script if it fails to automatically re-try initiating the failed script. You should consider this if you want a higher likelihood of your script completing successfully, and your database can withstand being quiesced repeatedly in a short amount of time.

We recommend that you also enable Default to crash-consistent snapshots if script fails. If enabled, then Amazon Data Lifecycle Manager attempts to create crash-consistent snapshots if it cannot successfully run your pre-script. You can use the tags applied on the snapshots as well as Amazon EventBridge to later determine if the EBS Snapshots were created as part of successful executions of the pre-script and post-script in your Systems Manager document.

Screenshot showing the selection of InterSystems SSM document

7. Under Advanced Settings, you can also set the policy to automate other actions, such as Cross-region copy and Cross-account sharing. In this example, we are setting the policy to make sure the most recent set of application-consistent EBS Snapshots for each EC2 instance has Fast Snapshot Restore enabled in us-east-1a. Therefore, volumes created from those snapshots instantly deliver all of their provisioned performance.

Screenshot shows enabling Fast Snapshot Restore

Step 3: Validate that the snapshots created are application consistent

Once your Amazon Data Lifecycle Manager policy has created an EBS Snapshot, you can check if it is application consistent.

1. Navigate to the Amazon EC2 console and select Snapshots.

Screenshot showing the selection of snapshots in the EC2 console

2. Select the snapshot and select Tags in the bottom panel. If you see the tag key ‘aws:dlm:lifecycle-policy-id’, then the snapshot was also created (and is managed) by Amazon Data Lifecycle Manager. If you see a tag for ‘aws:dlm:pre-script: SUCCESS’, then the snapshot was created following successful execution of the pre-script. If you see a tag for ‘aws:dlm:post-script: SUCCESS’, then the post-script also successfully completed. If you see “SUCCESS” for both tags and your Systems Manager document had the correct instructions to quiesce disk, then flush data to memory, thaw disk, and then the snapshots you have created are application consistent.

Screenshot showing snapshots with Tags.

Cleaning up

Clean up the snapshots created during the previous steps to make sure you do not incur storage charges. You can do so by navigating to the Snapshots screen, searching for all snapshots created by the policy, selecting all the snapshots, and then selecting Actions followed by Delete snapshot.

Similarly, you should delete the Amazon Data Lifecycle Manager policy to make sure no future snapshots are created by the policy. You can do so by navigating to the Lifecycle Manager screen, selecting the policy, and then selecting Actions followed by Delete lifecycle policy.

Conclusion

In this post, we walked through the automation of application-consistent EBS Snapshots creation for the InterSystems IRIS Data Platform by using Amazon Data Lifecycle Manager and AWS Systems Manager.

With Amazon Data Lifecycle Manager, you have to ability to exclude the root/boot volume when creating a set of application-consistent snapshots. You can exclude non-boot volumes (also known as data volumes), which is useful if you want to save on costs by not creating backups of volumes that are only used to store log or temporary data. You can also set your policy to manage Fast Snapshot Restore on the most recent set of snapshots so that you can create new EBS volumes that deliver maximum performance without needing to be initialized.

Amazon Data Lifecycle Manager policies are free to create, which saves you from having to use third-party tools or develop/maintain complex custom scripts. As a final takeaway, we encourage you to try this on your own environments. You can also learn more about this feature by reading our technical documentation and exploring different use cases for using pre and post scripts with Amazon Data Lifecycle Manager.

Thank you for reading this post. If you have any questions or comments, leave them in the comments section.

Dimitri Restaino

Dimitri Restaino

Dimitri Restaino is a Brooklyn-based AWS Solutions Architect focused on designing innovative and efficient solutions for healthcare companies in the North East. As a former software developer, he appreciates the limitless possibilities opened up by serverless technology. Off the clock, he can be found spending time in nature or setting fastest laps in his racing sim.

Behzad Dastur

Behzad Dastur

Behzad Dastur is a Software Engineering Manager with Amazon Elastic Block Store at AWS. He has over 10 years of experience designing and building distributed systems and enterprise software at scale. He is passionate about learning new technology and solving customer challenges with innovative solutions.

Eduard Lebedyuk

Eduard Lebedyuk

Eduard Lebedyuk is a Senior Cloud Architect at InterSystems, helping customers in their cloud migration journey to modernize and run InterSystems-based workloads securely and efficiently. A blogger at heart, he loves community-driven learning and sharing of technology. His main topics are CI/CD, containers, AI/ML, and cloud.

Denton He

Denton He

Denton He is a Senior Product Manager for Amazon Elastic Block Store (Amazon EBS) and leads the product for automation of EBS features. He is committed to helping users automate and simplify their workload processes running on EC2/EBS including self-managed databases, streaming workloads, and AI/ML.

Regilo Souza

Regilo Souza

Regilo Souza is a dental surgeon from Brazil who went rogue, turning his passion for technology into a new career path. With 20 years of experience in Health IT, he has worked for several UN agencies, including the World Health Organization. He currently leads InterSystems' Cloud Delivery Team.