Reduce recovery time and optimize storage costs with faster restores from Amazon S3 Glacier storage classes and Commvault

Data is the lifeblood of any modern business. Organizations are storing more copies of their application data than ever before to recover from data loss, repair data corruption or ransomware damage, respond to compliance requests, and become more data driven. Storing more data at reduced cost enables businesses to extract more value and insights to better serve their customers. As protected data environments (development, test, pre-production, and production) proliferate, IT teams are being tasked with getting control of growing storage and data retrieval costs. Additionally, heavily regulated industries like financial services, healthcare, and public sector often have minimum retention requirements, adding to cost pressures. At the same time, the business expectations for recovery time continue to decrease as more organizations plan to put cold storage to work for their business.

Commvault, an industry leader in cloud-native data management and active cyber security defense, hears a common theme from Amazon Simple Storage Service (Amazon S3) customers: “How can I store my cold and archival data at the lowest cost, while achieving a restore time in hours?” Since its launch in 2006, Amazon S3 has continued to reduce the cost of storing backups while providing industry-leading scalability, data availability, security, and performance. Many organizations choose to adopt Amazon S3 Intelligent-Tiering, which delivers automatic storage cost savings when data access patterns change, and the Amazon S3 Glacier storage classes for long-term cold and archival data to reduce storage costs and meet compliance needs while maintaining critical data copies for long-term preservation and business insights.

One of the most important priorities for Commvault customers is restoring backups quickly and efficiently. Behind the scenes, Commvault leverages Amazon S3 Batch Operations to maximize the restore throughput available to customers. Commvault Backup and Recovery automates end-to-end S3 Batch Operations to accelerate and simplify the restores of AWS workloads stored in S3 Glacier storage classes. Commvault sees everything from financial and medical records, to critical engineering intellectual property being securely stored in the S3 Glacier storage classes. Most importantly, the S3 Glacier storage classes, in combination with the Commvault Combined Storage Tier, enables a cost optimized data-driven business, where data is easily recalled at the click of a button.

In this post, we walk through the process of writing infrequently accessed Amazon EC2 and Amazon EKS compute backups to the S3 Glacier storage classes, which are purpose-built for data archiving, providing you with the highest performance, most retrieval flexibility, and lowest cost archive storage in the cloud. Then, we demonstrate how to enable S3 Batch Operations for a reduction in restore time of up to 85% (compared to restores using individual Restore Object API calls). We then perform a simple browse and restore for protected instances. You can reduce the operational complexity, time to restore, and cost to restore long-term backups and compliance copies by using the S3 Glacier storage classes and Commvault.

Solution overview

In this example, we show how long-term backups and compliance copies of Amazon EC2 compute and Amazon EKS applications may be stored in the S3 Glacier storage classes to achieve reduced-cost backups. Commvault protects and recovers a broad selection of AWS compute, container, database, and storage workloads, SaaS, and traditional hybrid workloads, many of which are stored in S3 Glacier.

Figure 1 - Commvault Backup and Recovery with faster restores for Amazon S3 Glacier

Figure 1: Commvault Backup and Recovery with faster restores for Amazon S3 Glacier

After our initial backup, we configure Commvault Backup and Recovery to use Amazon S3 Batch Operations for automated faster restores from an S3 Glacier-based Commvault Cloud Storage Library. There are several ways to configure S3 Glacier storage classes within Commvault. We will use Commvault Combined Storage Tiers, which pin Commvault metadata and indexes to instant access storage classes for optimal restore time.

Solution walkthrough: Optimized restore from Amazon S3 Glacier asynchronous storage classes

At a high level, the process of restoring operational backups from Amazon S3 Glacier Flexible Retrieval and Amazon S3 Glacier Deep Archive storage classes using S3 Batch Operations and Commvault includes the following steps (in this example, we demonstrate using S3 Glacier Flexible Retrieval):

Create a Commvault Cloud Storage location (bucket) using Amazon S3 Glacier Flexible Retrieval.
Configure a backup policy (Data Protection Server Plan) to write backups to the new S3 bucket. Commvault writes backup data in deduplicated, compressed, and encrypted format to further reduce storage and replication costs of backup data stored in Amazon S3.
Take one or more backup copies of tagged Amazon EC2 instances and/or Amazon EKS applications to the new S3 bucket.
Enable Amazon S3 Batch Operations restore mode within Commvault Backup and Recovery (via FeatureGates, also referred to as CommCell Setting).
(After setting up proper AWS IAM permissions) Perform one or more restores which results in Commvault calling s3:CreateJob to initiate a multi-object restore (S3 Initiate Restore Object).
Complete the restore of the compute instances or containerized applications back to the AWS Region as specified in the restore options.

Figure 2 - Amazon S3 Batch Operations recovery steps Figure 2: Amazon S3 Batch Operations recovery steps

Implementation (steps 1-4 in the solution walkthrough)

1. Create and configure cloud storage (Amazon S3 bucket) using Commvault Command Center (following screenshot), CLI, or REST API. Ensure that S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive is selected as the storage class. The following example depicts the use of S3 Glacier Flexible Retrieval.

Commvault supports the following Amazon S3 storage classes for S3 Glacier faster restores:

- Commvault combined storage with S3 Glacier Flexible Retrieval
- Commvault combined Storage with S3 Glacier Deep Archive
- Commvault combined Storage with S3 Intelligent-Tiering Archive Access and Deep Archive Access tiers
- S3 Glacier Flexible Retrieval
- S3 Glacier Deep Archive
- S3 Intelligent-Tiering Archive Access and Deep Archive Access tiers

Figure 3 - Creating a Cloud Storage Library in Commvault

Figure 3: Creating a Cloud Storage Library in Commvault

When selecting S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive, Commvault recommends maintaining the Use combined storage setting, which is enabled by default to streamline browse and restore operations from asynchronous storage classes.

Figure 4 - Configuring Commvault Combined Storage with Amazon S3 Glacier storage classes

Figure 4: Configuring Commvault Combined Storage with Amazon S3 Glacier storage classes

2. Create a Server Plan that sets the frequency, retention, and new Cloud Storage Library location for backup data. In this example, Monthly Long-Term Retention copies are being written to S3 Glacier Flexible Retrieval, and then Yearly Long-Term Retention copies are being written to S3 Glacier Deep Archive.

Figure 5 - Commvault Server Plan tiering to S3 Glacier

Figure 5: Commvault Server Plan tiering to S3 Glacier

3. Create an AWS VM Group that automatically discovers and protects targeted compute and container instances by AWS Resource tag or Kubernetes Label selector. Use powerful resource tagging rules to automatically discover compute and containerized workloads to protect at backup runtime. This approach dramatically simplifies cloud operations, and auto-adjusts as workloads are deployed and terminated.

Figure 6 – Creating a Commvault EC2 VM Group using AWS Resource Tags

Figure 6: Creating a Commvault EC2 VM Group using AWS Resource Tags

4. After running a backup, enable the use of Amazon S3 Batch Operations for faster restores from S3 Glacier, by setting the following FeatureGates or CommCell Settings:

- EnableS3BatchOperations must be set to true to enable batch operations-based restores.

Figure 7 - Enabling Amazon S3 Batch Operations via additional setting

Figure 7: Enabling Amazon S3 Batch Operations via additional setting

- S3BatchOperationsRoleArn must be set to the Role Amazon Resource Name (ARN) that S3 Batch Operations will use to perform the restore.

Figure 8 - Setting Amazon S3 Batch Operations Role ARN via additional setting

Figure 8: Setting Amazon S3 Batch Operations Role ARN via additional setting

AWS Identity and Access Management (IAM) permissions and S3 Batch Operations

Before attempting an S3 Glacier faster restore, you must grant the s3:CreateJob and iam:PassRole user permissions to the AWS IAM identity that is used to read of backup data from Amazon S3.

See the Granting permissions for Amazon S3 Batch Operations documentation for additional details.

Additionally, when an S3 Batch Operations job is created, an Amazon Resource Name (ARN) for the AWS Identity and Access Management (IAM) Role that S3 Batch Operations will use to perform the restore is required. S3 Batch Operations assumes this role to perform the restore (s3:RestoreObject).

The S3 Batch Operations role requires the following permissions policies, see the Attaching permissions policies documentation for example AWS IAM policy definitions:

s3:PutObject to write the S3 Batch Operations CSV manifest file to the cloud storage bucket.
s3:GetObject to read the S3 Batch Operations CSV manifest file from the cloud storage bucket.
s3:PutObject to write the S3 Batch Operations completion report to the cloud storage bucket.
Access to the AWS Key Management Service (AWS KMS) keys used to encrypt any S3 objects being restored.
kms:Decrypt and kms:GenerateDataKey if your manifest file or completion report will be encrypted with AWS KMS.

Commvault synthesizes and uploads a CSV-formatted manifest file to s3BatchOperationsRestore/CVRestoreJobId-nnn/manifest.csv at the top level of the Amazon S3 bucket where the objects are being restored. Commvault also requests that S3 Batch Operations write a completion report (all details) for all batch-initiated restores to s3BatchOperationsRestore/CVRestoreJobId-nnn/CompletionReport/{batch-operation-job-id}.

Commvault does not remove manifests or completion reports after a restore job completes, to allow for other data management and analytics processes to consume the completion metadata, for example, Amazon Athena, Amazon QuickSight, or Amazon CloudWatch.

Commvault recommends configuring an Amazon S3 lifecycle policy to expire objects in the s3BatchOperationsRestore/ prefix periodically, per your business data management strategy. Commvault will not remove generated manifests or completion reports after a successful or unsuccessful S3 Glacier faster restore.

Perform and complete restores (Steps 5 & 6 in the solution walkthrough)

We are now ready to run a faster restore test from S3 Glacier Flexible Retrieval, leveraging S3 Batch Operations to optimize and accelerate the resource operation. To recap, the STANDARD retrieval request expectations prior to faster restores using S3 Batch Operations is as follows:

STANDARD retrievals typically start within 3-5 hours for objects stored in S3 Glacier Flexible Retrieval storage class or in the S3 Intelligent-Tiering Archive Access Tier
STANDARD retrievals typically start within 12 hours for objects stored in S3 Glacier Deep Archive storage class or in the S3 Intelligent-Tiering Deep Archive Access Tier

When leveraging S3 Batch Operations to perform bulk restore actions, the following STANDARD retrieval request improvements can be expected:

STANDARD retrievals typically start within 30 minutes for objects stored in S3 Glacier Flexible Retrieval storage class or in the S3 Intelligent-Tiering Archive Access Tier
STANDARD retrievals typically start within 9 hours for objects stored in S3 Glacier Deep Archive storage class or in the S3 Intelligent-Tiering Deep Archive Access Tier

See the Archive Retrieval options documentation for more information.

Why the Standard retrieval tier? STANDARD is the default option for retrieval requests that do not specify the retrieval option, and Standard retrievals are free for objects that are stored in S3 Intelligent-Tiering. Faster restores with S3 Glacier do not require the purchase of provisioned capacity units (PCUs) needed to ensure retrieval capacity for EXPEDITED recalls.

The basic steps involved in the restoration are:

1. Initiate the restore using Commvault Command Center, CLI, or REST SDK. There is no special configuration or setting required. Commvault automates an S3 Glacier faster restore for the application owner or cloud ops admin.

Figure 9 – Commvault Restore options for Amazon EC2 instances

Figure 9: Commvault Restore options for Amazon EC2 instances

2. Commvault automatically identifies which Amazon S3 objects are required to restore the instance(s) and passes these files to the Commvault Cloud Archive Recall workflow.

3. The Cloud Archive Recall workflow will synthesize an S3 Batch Operations CSV format manifest and upload it to s3BatchOperationsRestore/CVRestoreJobId-nnn/manifest.csv within the S3 bucket being used for the restore (the manifest file will be placed in the S3 Standard storage class).

The manifest contains the bucket name,objectkey_to_restore

cvltbkp,QXB5SZ_06.04.2023_19.51/CV_MAGNETIC/V_1/CHUNK_1/CHUNK_META_DATA_1.FOLDER/0
cvltbkp,QXB5SZ_06.04.2023_19.51/CV_MAGNETIC/V_5/CHUNK_4/CHUNK_META_DATA_4.FOLDER/0
cvltbkp,QXB5SZ_06.04.2023_19.51/CV_MAGNETIC/V_6/CHUNK_7/CHUNK_META_DATA_7.FOLDER/0
cvltbkp,QXB5SZ_06.04.2023_19.51/CV_MAGNETIC/V_4/CHUNK_8/CHUNK_META_DATA_4.FOLDER/0

Commvault does not specify the (optional) S3 object version in the manifest file, only the latest version of each object is used for the recovery.

See the Specifying a manifest documentation for more details on the manifest file format.

4. The Cloud Archive Recall workflow compute node (typically a Commvault MediaAgent or Cloud Access Node) then contacts the S3 Batch Operations regional service endpoint (https://account-id.s3-control.region.amazonaws.com) and submits a new s3:CreateJob request, and receives an S3 Batch Operations Job Id.

Commvault passes the ConfirmationRequired = false parameter as part of the s3:CreateJob request. There is no need to confirm the batch operation within the Amazon S3 console. It will move to run state immediately.

For S3 Glacier Fiexible Retrieval and S3 Glacier Deep Archive, a Restore Object batch operation creates a temporary copy of each of the objects requested, and then deletes the copy after the ExpirationInDays days have elapsed. Commvault sets the ExpirationInDays to 1, as the data is no longer required after the restore completes. For Amazon S3 Intelligent-Tiering, the objects are moved to the Frequent Access tier.

See the Restoring objects from the S3 Intelligent-Tiering Archive Access and Deep Archive Access tiers documentation.

Remember, when you restore an archived object from S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive, you pay for both the archive and the copy that you restored temporarily. For information about pricing, see the documentation for Amazon S3 pricing.

Preferred ExpirationInDays in days can be configured by setting the AutoCloudRecallExpireDays CommCell Setting.

5. The Cloud Archive Recall then polls all of the Amazon S3 objects being restored by periodically running a HEAD Object (s3:HeadObject) against each object, until all objects are restored. The default polling interval is 20 minutes and may be customized by setting the nCloudChunkRecallSleepIntervalMins

Add CommCell Settings

Figure 10: Tuning the polling interval for Commvault S3 Batch Operations restores

Commvault recommends setting the polling interval to the following values, based on the S3 Glacier storage class restores. Commvault performed testing of 10 GB to 500 GB restore performance using S3 Glacier faster restores and found the following polling intervals to be optimal to minimize restore time:

- S3 Glacier Flexible Retrieval 20 (mins)
- S3 Glacier Deep Archive 540 (mins) or 9 hours

The x-amz-restore field returned by s3:HeadObject, shows whether a restore activity is currently in progress for the object or completed with the expiry timer started:

x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2023 00:00:00 GMT"

If the object restoration is in progress, the header returns the value ongoing-request="true".

See the s3:HeadObject – Response Elements (x-amz-restore) documentation for more details.

6. The Cloud Archive Recall returns the success to the Commvault Job Manager, which then continues to restore the compute or containerized instance to the preferred location.

You can observe the submission of the S3:CreateJob in AWS CloudTrail. Here you can see an example log entry:

Figure 11 - Amazon CloudTrail example Log Entry for s3:CreateJob request created by Commvault

Figure 11: Amazon CloudTrail example log entry for s3:CreateJob request created by Commvault

And in this screenshot, you can see the body of the CloudTrail log:

Figure 12 - Amazon CloudTrail example Log Body for s3:CreateJob request created by Commvault

Figure 12: Amazon CloudTrail example Log Body for s3:CreateJob request created by Commvault

Commvault requests a completion report be written to s3BatchOperationsRestore/CVRestoreJobId-nnn/CompletionReport/{batch-operation-job-id}.

Commvault summarizes the number of successful and unsuccessful objects within the Commvault Log Files/WorkflowCustom.log for troubleshooting partial or complete restore failures.

You can use the Commvault Restore Job Id to locate that appropriate CVRestoreJobId-nnn prefix and completion report, to investigate failures. See the Tracking job failure documentation for more information on how unsuccessful operations are handled and logged.

Your s3BatchOperationsRestore/CVRestoreJobId-nnn folder will contain a job folder for each s3:CreateJob executed, for example:

s3://source-bucket/s3BatchOperationsRestore/
   CVRestoreJobId-342/
      job-410b054c-be59-47ae-b04b-a713c148bedb/
         manifest.json
         manifest.json.md5
         results/
            e2ce4b092a4a670a58fa8d412e5a975658b5d49b.csv

See the Examples: S3 Batch Operations completion reports documentation for details on how to interpret completion reports.

Outcome

Before S3 Glacier faster restores, Commvault would orchestrate the recall of data from S3 Glacier asynchronous storage classes, with typical restore times being:

Within 3-5 hours for a 50-150 GB Amazon EC2 instance or containerized application in S3 Glacier Flexible Retrieval.
Within 12 hours for a 50-150 GB Amazon EC2 instance or containerized application in S3 Glacier Deep Archive.

After lab-based testing of S3 Glacier faster restores, Commvault found the following significant restore time improvements:

For S3 Glacier Flexible Retrieval:

30 minutes to restore a micro 10 GB Amazon EC2 instance
45 mins to restore a small 58 GB Amazon EC2 instance

For S3 Glacier Deep Archive:

9 hours to restore a small 50 GB Amazon EC2 instance
10 hours to restore a medium 250 GB Amazon EC2 instance

While S3 Glacier is intended for infrequently accessed long-term retention, regulatory or archival data, Commvault found that small targeted restores of individual workloads stored in S3 Glacier Flexible Retrieval experienced a 85% reduction in restore time (versus individual Restore Object API calls). This means customers can continue to use ultra-low-cost S3 Glacier storage classes to reduce the storage cost for growing long-term retention and regulatory data copies, without compromising on recall time. With this improvement to S3 Glacier data restore times, organizations can put their cold storage to work faster than ever before. Business agility is increased by getting long-term retention data to regulators, auditors, and application owners in minutes or hours. Additionally, Commvault power-managed MediaAgents responsible for the restore, complete their activities faster, resulting in reduced Amazon EC2 runtime costs to perform the restore.

Conclusion

In this post, we covered how you can use S3 Glacier faster restores and Commvault Backup and Recovery to perform a faster, more cost-optimized restore of data from S3 Glacier asynchronous storage classes. These restores are faster than existing STANDARD, BULK, and more cost effective than EXPEDITED retrieval options.

Commvault sees broad adoption of S3 Glacier and S3 Intelligent-Tiering (with Archive Access and Deep Archive Access tiers) across education, financial services, healthcare, information technology, transportation, and the public sector to name just a few. As most restores consist of a small targeted subset of overall stored data, S3 Glacier faster restores offers the following key outcomes:

The most cost-effective way to restore data from S3 Glacier storage classes compared to non-batch or EXPEDITED retrievals.
Reduced Recovery Time Objectives (RTOs) with small 10 GB Amazon EC2 restores completing in 30 mins from S3 Glacier Flexible Retrieval, and in 9 hours from S3 Glacier Deep Archive.
Up to an 85% reduction in restore times, allowing the continued use of ultra-low-cost storage, while meeting business expectations for data accessibility.

Customers no longer need to trade-off between low storage cost and desired recall time. S3 Glacier faster restores combined with Commvault Backup and Recovery is the simplest, fastest, and most cost-effective way to restore long-term retention data.

To learn more, visit the Amazon S3 Batch Operations – Restore objects documentation and the Using Amazon S3 Glacier faster restores documentation for details on how to set up and use this in your environment. You can get started today with faster restores for Amazon S3 Glacier by deploying Commvault Backup & Recovery from the AWS Marketplace, and updating to Maintenance Release 11.30.59 (also available from Commvault Maintenance Advantage).

You can get started today by spinning up your own Commvault Backup & Recovery solution from the AWS Marketplace; just deploy the Commvault Backup and Recovery BYOL AMI-based product for a limited trial license. Alternatively, checkout Modern Data Protection for AWS for more details on protecting your AWS services with Commvault.

Thanks for reading this blog post. If you have any comments or questions, don’t hesitate to post in the comments section.