Implementing restore testing for recovery validation using AWS Backup

Mission-critical applications power everything from e-commerce to healthcare, making a solid backup strategy not just a best practice but a vital necessity. Threats like ransomware grow more sophisticated and for AWS users, having backups isn’t enough. Organizations need confidence that these safeguards will perform when disaster strikes. Manual testing, while essential, drains IT resources. Through automated restore testing on set schedules, organizations gain more than efficiency: they establish a system that verifies backup integrity, streamlines compliance reporting, and frees up valuable personnel—all while building certainty their protection strategy will deliver when needed.

In our earlier post, Validating recovery readiness with AWS Backup restore testing, we explored why AWS Backup restore testing is critical for meeting internal disaster recovery (DR) policies and regulatory mandates. In a digital landscape shaped by regulations such as the European Union’s Digital Operational Resilience Act (DORA) and New York Department of Financial Services (NYDFS) Cybersecurity Regulation, resilience isn’t just a goal but a requirement. AWS Backup restore testing can provide the proof required from these regulations; while testing and auditing features provide a wide breadth of possibilities to help organizations validate and report on their resiliency efforts.

In this post you will learn how to configure AWS Backup restore testing and some best practices to consider when creating your own plans. You will also get an example to see how end-to-end restore testing works in practice.

How AWS Backup restore testing works

AWS Backup restore testing enables users to test data restoration on a predefined schedule and validate the restored data. The ability to set a schedule and create automated processes to check data restoration reduces manual effort and helps meet compliance requirements. Without automated restore testing, personnel might have to choose systems and recovery points, complete manual restores, and have application teams validate the restored data. This takes multiple teams’ time and resources that could be better spent improving applications. When automating this process, we can create data validation pipelines to validate restores regularly.

An AWS Backup restore testing plan is built in two phases. First, you create the restore testing plan. Second, you create protected resource selections to be restored by the restore testing plan. When AWS Backup completes restoration, you can build restore testing validation with AWS Lambda functions triggered by Amazon EventBridge. Lambda functions can perform a variety of validation activities— including checking for connectivity, retrieving objects from Amazon S3, or getting encryption key status for actual data validation—then report back to AWS Backup if the validation was a success or failure. After restore testing is complete, you can use AWS Backup Audit Manager reports to show compliance as needed.

Building a restore testing plan

The first phase of implementing restore testing is building the restore testing plan. As there are many considerations that go into the frequency of testing and what to test, we will focus on best practices and recommendations. There are three parts to a restore testing plan:

Test frequency
Start within time
Recovery point selection criteria

With Test frequency, you want to test critical resources daily or weekly. You should test resources within their retention period, meaning if we keep recovery points for 14 days, you should have testing run more frequently than that. Start within time depends on how many recovery points you will be testing and how long each restore takes to finish. Each service has maximum concurrent restores allowed, and you need to make sure your plans are spaced out such that you don’t exceed it.

Figure 1: Example restore testing plan configuration

Recovery point selection can include all or specific vaults, timeframe for eligible recovery points, and whether to include point-in-time recovery (PITR) resource types. You might have one vault per account, or multiple vaults based on application types or tiers. If you’re replicating to a central backup account, an optimal design would be to create a logically air-gapped (LAG) vault per source account.

Figure 2: Sample AWS Backup recovery point selection criteria

After creating the restore testing plan, you move on to phase 2: creating protected resource selections. With each resource selection, you must choose a single resource type, such as Amazon S3 or Amazon Relational Database Service (Amazon RDS). After selecting a resource type, restore testing enables you to further customize which specific resources are selected. Restore testing plans allow up to 30 protected resource selections each. When creating the resource assignment, you can choose the default AWS Backup IAM role. If the default role does not exist, the role will be created with the proper permissions.

Tip: If you use Lambda testing, have the retention period be longer than the time needed for data validation to complete.

Figure 3: Sample resource assignment for S3

You can filter resources by individual selection or tag, which allows you to select specific resources to test based on your requirements. In Figure 4, we choose S3 as the resource type, then choose to filter by tag to select specific buckets.

Each service has its own set of possible restore metadata which provides default values to successfully perform restore testing, of which AWS Backup infers a minimal set of restore metadata. There is also overridable metadata that you can change to override defaults. You can read about inferred and overridable metadata in the AWS Backup documentation.

Figure 4: Example protected resources selection filtered using tags

After defining the overall plan and resource selections, we have a fully operational plan as seen in Figure 5.

Figure 5: Example completed configuration of AWS Backup restore testing plan

Implementing restore validation

Configuring a restore testing plan is only half the battle; you also must verify that your restored data is useable. AWS Backup sends Amazon EventBridge events for restore job status changes. We can use these events to trigger an AWS Lambda function when a restore testing job changes to the completed state, which allows application teams to create code to test their data. Test code depends on what service you’re protecting, but could include retrieving objects from an S3 bucket or querying an Amazon DyanamoDB. Once your Lambda function runs, then it can report success or failure back to AWS Backup. Figure 6 shows this example restore validation workflow.

Figure 6: Workflow of AWS Backup restore testing validation

If you have multiple restore testing plans, then you can tailor the EventBridge rule to send certain events to certain functions (as shown in Figure 7) by including the restore testing plan Amazon Resource Name (ARN). Using the restore testing plan ARN also allows you to filter out manual restores.

Figure 7: Example of EventBridge event pattern for restore job

After creating the event pattern, you select a target to send the event to. If you have multiple resource types which require separate testing criteria, a Lambda coordinator can help send events for different resource types to the correct validation. The Lambda coordinator checks the resource type and routes the event to the correct data validation Lambda. If you have multiple protected resource types, you would choose this Lambda coordinator as the target for the EventBridge events, as seen in Figure 8.

Figure 8: Sample target selection for EventBridge rule

The following is the sample restore Lambda coordinator code:

import json
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
lambda_client = boto3.client('lambda')
def handler(event, context):
    logger.info("Handling event: %s", json.dumps(event))
    resource_type = event.get('detail', {}).get('resourceType', '')
    function_name = None
    try:
        if resource_type == "RDS":
            function_name = "RDSRestoreValidation"
            logger.info("Resource is an RDS instance. Invoking Lambda function: %s", function_name)
        elif resource_type == "S3":
            function_name = "S3RestoreValidation"
            logger.info("Resource is an S3 bucket. Invoking Lambda function: %s", function_name)
        else:
            raise ValueError(f"Unsupported resource type: {resource_type}")
        # Invoke the appropriate Lambda function
        response = lambda_client.invoke(
            FunctionName=function_name,
            Payload=json.dumps(event),
            InvocationType="RequestResponse"
        )
        logger.info("Lambda invoke response: %s", response)
    except Exception as e:
        logger.error("Error during Lambda invocation: %s", str(e))
        raise e
    logger.info("Finished processing event for resource type: %s", resource_type)

In the restore Lambda coordinator code, you can see that if the resource type matches S3 it forwards the entire event to another Lambda function called S3RestoreValidation. The S3RestoreValidation function that follows conducts restore validation on an S3 resource and reports back to AWS Backup success or failure of validation.

import json
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3_client = boto3.client('s3')
backup_client = boto3.client('backup')
def handler(event, context):
    logger.info("Handling event: %s", json.dumps(event))
    restore_job_id = event.get('detail', {}).get('restoreJobId', '')
    resource_type = event.get('detail', {}).get('resourceType', '')
    created_resource_arn = event.get('detail', {}).get('createdResourceArn', '')
    validation_status = "SUCCESSFUL"
    validation_status_message = "Restore validation completed successfully"
    try:
        if resource_type == "S3":
            bucket_name = get_bucket_name_from_arn(created_resource_arn)
            # List objects in the bucket
            response = s3_client.list_objects_v2(Bucket=bucket_name)
            # Check if the bucket contains more than 1 object
            object_count = response.get('KeyCount', 0)
            if object_count > 1:
                logger.info(f"Bucket {bucket_name} contains more than 1 object. Validation successful.")
            else:
                logger.info(f"Bucket {bucket_name} contains 1 or fewer objects. Validation failed.")
                validation_status = "FAILED"
                validation_status_message = f"Bucket {bucket_name} contains only       {object_count} object(s)."
        else:
            validation_status = "FAILED"
            validation_status_message = f"Unsupported resource type: {resource_type}"
        # Report validation result to AWS Backup
        backup_client.put_restore_validation_result(
            RestoreJobId=restore_job_id,
            ValidationStatus=validation_status,
            ValidationStatusMessage=validation_status_message
        )
        logger.info("Restore validation result sent successfully")
    except Exception as e:
        logger.error("Error during restore validation: %s", str(e))
        validation_status = "FAILED"
        validation_status_message = f"Restore validation encountered an error: {str(e)}"
        # Report failure result to AWS Backup
        backup_client.put_restore_validation_result(
            RestoreJobId=restore_job_id,
            ValidationStatus=validation_status,
            ValidationStatusMessage=validation_status_message
        )
    logger.info("Finished processing restore validation for job ID: %s", restore_job_id)
def get_bucket_name_from_arn(arn):
    arn_parts = arn.split(":")
    resource_parts = arn_parts[-1].split("/")
    return resource_parts[-1]

The S3RestoreValidation code validates an S3 restore by verifying that the bucket has more than one object in it. After checking, it reports back to AWS Backup regarding whether the restore completed successfully or not. A completely successful restore and validation will yield a summary like Figure 9. The status of the job should say Completed, and the validation status should be Successful. When setting the validation status in your Lambda code, you can optionally include a validation message will appear in the AWS Backup console and APIs. You can read more on restore validation and samples in the documentation.

Figure 9: Example of AWS Backup restore testing completion

Figure 9: Example of AWS Backup restore testing completion

AWS Backup automatically starts the deletion process of the restored resource when either the validation is sent or the cleanup period expires. The deletion time can vary based on resource type. Most resources are deleted quickly, but some can take longer. For example, the deletion of an S3 bucket is a two-step process that involves first adding lifecycle rules to delete objects and then deleting the bucket when empty. These lifecycle rules can take a couple of days to execute.

AWS Backup restore testing considerations

Now that you understand best practices around creating a restore testing plan and validation, there are a few other implementation details you should consider.

Cost optimization

Cost optimization matters across the backup lifecycle, including restore testing. Here’s how to manage costs effectively:

Choose resources narrowly: Use tags or selections to test only critical resources, avoiding non-production ones unless compliance necessitates it.
Schedule tests by criticality: Schedule tests by criticality (daily or weekly for critical resources, quarterly or semi-annually for others) aligning with policies and retention periods (for example test within 14 days if retention is 14 days).
Optimize retention period: Minimize restored data duration to reduce costs. Set deletion times based on automated tests.

Pricing for restore testing can be found on the AWS Backup pricing page.

AWS Backup auditing and reporting

AWS Backup Audit Manager helps you ensure your backup policies and resources comply with internal or regulatory standards. It tracks whether resources are backed up, backup frequency, if vaults are logically air-gapped, and if restore times meet targets. AWS Backup audit manager audit frameworks enable this by offering built-in controls or custom options to align resources with policies.

Audit reports provide compliance evidence for sharing. Two types of reports exist: jobs reports, which show completed and active jobs from the last 24 hours (for example, the restore job report for recent restores), and compliance reports, which monitor resource status or framework controls. Management accounts gain multi-account visibility for organization-wide reports. See the AWS Backup documentation for report creation steps and details on using frameworks.

Restore testing plan quotas

When creating restore testing plans, make sure that your plans meet your testing requirements and complete on time. Each resource type has a limit on concurrent restore jobs from testing plans (not on-demand restores). The Start within window from phase 1 is key. Start within configuration means that all resource selections for a restore testing plan must start within this window, and you need to be careful not to exceed the concurrent limit. For example, Amazon S3 allows 30 concurrent restores, so choosing 90 buckets with a one hour window risks delays. To plan effectively, use a longer start within window or create multiple plans with staggered starts—especially for frequent (daily/weekly) and periodic (monthly/quarterly) tests running together. Check adjustable limits in the documentation and request increases if needed.

Visualizing end to end restore testing

To see how end-to-end restore testing works and how testing plans are deployed and integrated, we’ve included a sample restore testing plan. This sample plan helps you visualize each step of the process and see how restore and validation interact.

This is a pre-configured AWS CloudFormation stack that runs automatically on a daily schedule.

Prerequisites

The following prerequisites are necessary to complete this solution:

AWS Backup configured in your account
Recovery points of Amazon S3 and/or Amazon RDS
- For Amazon RDS restore you may enter an Amazon RDS subnet group name, or one is chosen. It isn’t recommended to do restore testing in a production VPC. Instead, a test or isolated VPC should be chosen.

Launch AWS CloudFormation stack

This AWS CloudFormation template deploys everything necessary to have automated restore testing of both Amazon S3 and Amazon RDS.

Running the restore testing plan

After deployment, there is no manual intervention needed to run the plan. The restore testing plan runs once daily on all resources that are chosen by that plan. As noted in Figure 6, AWS Backup finishes the restoration, which then runs Lambda functions to validate the restores.

When validation completes successfully, the validation should appear as in Figure 10.

Figure 10: Example of AWS Backup restore testing completion

Figure 10: Example of AWS Backup restore testing completion

Cleaning up

All restores that were completed by the restore testing plan are automatically deleted after four hours. If you’re using restore testing for Amazon S3 resources, then S3 bucket deletions with data take longer to remove data. This is due to lifecycle rules which take a couple days to run. To avoid incurring any further charges delete the CloudFormation stack, which deletes the restore testing plans and stops further testing. For instructions, refer to Deleting a stack on the CloudFormation console.

Conclusion

AWS Backup restore testing is a flexible and extensible feature that enables you to tailor a solution to the needs of your organization. You can start by understanding what your organization policies are then exploring AWS Backup restore testing capabilities in the AWS Management Console and learn how to integrate automated testing into your DR and cyber resilience strategies. To implement AWS Backup restore testing in your environment, visit the AWS Backup documentation. You can also engage with AWS Solutions Architects to design a comprehensive backup validation strategy tailored to your organization’s needs.

The message is clear: automated backup validation is no longer optional: it’s a fundamental requirement for modern business continuity. Regular testing helps meet internal policies, regulatory mandates, and cyber resilience requirements, while AWS Backup Restore testing provides a scalable, efficient solution for ensuring recovery readiness.