Guidance for Automated Restore and Copy for Amazon S3 Glacier Objects

This Guidance provides an automated workflow to restore archived Amazon Simple Storage Service (Amazon S3) data stored in S3 Glacier Flexible Retrieval and Deep Archive storage classes to a new storage class. It uses an in-place copy and copies to a new prefix within the same S3 bucket or to another S3 bucket in the same or different AWS account or Region. The Guidance orchestrates the steps involved in archive restore, including S3 bucket inventory generation, manifest query and optimization, archive retrieval, and the copy process. It also tracks the progress to send job status notification.

This Guidance consists of two architectures. The first architecture is an overview that shows you how to request restoration of archived items listed in a CSV manifest. The second architecture shows how to automate the creation of a CSV manifest using an AWS Step Functions workflow.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Overview
Automated CSV Manifest Generator

Overview
This architecture shows how to request restoration of archived items listed in a CSV manifest. For details on how to automate the creation of the CSV manifest, open the Automated CSV Manifest Generator tab.

Step 1
Upload a list of Amazon Simple Storage Service (Amazon S3) archived objects that you want restored to a destination bucket to the solution S3 bucket using a CSV manifest.

Step 2
This upload invokes the Restore Worker AWS Lambda function.

Step 3
The Restore Worker Lambda function submits a Restore Operation job to Amazon S3 Batch Operations.

Step 4
Amazon S3 Batch Operations initiates object restore in the S3 bucket containing the archived objects.

Step 5
The Amazon S3 Batch Operations completion report invokes the Job Tracker Worker Lambda function.

Step 6
The Job Tracker Worker Lambda function creates an entry in the Amazon DynamoDB table with the restore details.

Step 7
Amazon EventBridge invokes the Job Scheduler Worker Lambda function on a schedule.

Step 8
The Job Scheduler Worker Lambda function queries the state table to determine the status of each restore job and when it is eligible for a copy job.

Step 9
The Job Scheduler Worker Lambda function invokes the Copy Worker Lambda function to submit Amazon S3 Batch Operations and invoke the Lambda function job.

Step 10
Amazon S3 Batch Operations invokes a Lambda function that performs the actual copy operation to the destination bucket.

Step 11
The Amazon S3 Batch Operations report invokes the Job Tracker Worker Lambda function to send an Amazon Simple Notification Service (Amazon SNS) message to the user stating that the S3 object is available.

Click to enlarge

Step 1
Upload a list of Amazon Simple Storage Service (Amazon S3) archived objects that you want restored to a destination bucket to the solution S3 bucket using a CSV manifest.

Step 2
This upload invokes the Restore Worker AWS Lambda function.

Step 3
The Restore Worker Lambda function submits a Restore Operation job to Amazon S3 Batch Operations.

Step 4
Amazon S3 Batch Operations initiates object restore in the S3 bucket containing the archived objects.

Step 5
The Amazon S3 Batch Operations completion report invokes the Job Tracker Worker Lambda function.

Step 6
The Job Tracker Worker Lambda function creates an entry in the Amazon DynamoDB table with the restore details.

Step 7
Amazon EventBridge invokes the Job Scheduler Worker Lambda function on a schedule.

Step 8
The Job Scheduler Worker Lambda function queries the state table to determine the status of each restore job and when it is eligible for a copy job.

Step 9
The Job Scheduler Worker Lambda function invokes the Copy Worker Lambda function to submit Amazon S3 Batch Operations and invoke the Lambda function job.

Step 10
Amazon S3 Batch Operations invokes a Lambda function that performs the actual copy operation to the destination bucket.

Step 11
The Amazon S3 Batch Operations report invokes the Job Tracker Worker Lambda function to send an Amazon Simple Notification Service (Amazon SNS) message to the user stating that the S3 object is available.
Automated CSV Manifest Generator
This architecture shows how to automate the creation of a CSV manifest using an AWS Step Functions workflow. For an overview of the full architecture, open the Overview tab.

Step 1
Amazon S3 delivers the inventory report into the solution S3 bucket which invokes a Lambda function and starts the workflow.

Step 2
Amazon Athena and AWS Glue queries the Amazon S3 inventory to determine the number of archived objects to be restored.

Step 3
Process chunks, optimizes and generates CSV manifests from the Amazon S3 inventory using Athena.

Step 4
The workflow removes inventory configuration on the S3 archive bucket, and for each CSV manifest chunk, triggers the Restore Worker Lambda function in the solution core. The workflow then sends an email notification for each submitted restore job.

Step 5
The AWS Step Functions workflow completes successfully and sends an email to notify the user. Track restore and copy jobs using the Amazon S3 Batch Operations Section in the AWS Management Console.

Click to enlarge

Step 1
Amazon S3 delivers the inventory report into the solution S3 bucket which invokes a Lambda function and starts the workflow.

Step 2
Amazon Athena and AWS Glue queries the Amazon S3 inventory to determine the number of archived objects to be restored.

Step 3
Process chunks, optimizes and generates CSV manifests from the Amazon S3 inventory using Athena.

Step 4
The workflow removes inventory configuration on the S3 archive bucket, and for each CSV manifest chunk, triggers the Restore Worker Lambda function in the solution core. The workflow then sends an email notification for each submitted restore job.

Step 5
The AWS Step Functions workflow completes successfully and sends an email to notify the user. Track restore and copy jobs using the Amazon S3 Batch Operations Section in the AWS Management Console.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

This Guidance can be re-deployed using the AWS CloudFormation template. The solution includes an SNS notification function that provides job status and failures.

Read the Operational Excellence whitepaper
Security

Data stored in DynamoDB and Amazon S3 are protected by default through AWS encryption. By default, S3 buckets have access control lists (ACLs) disabled and public block access enabled.

Read the Security whitepaper
Reliability

Step Functions has retry and exponential back-off enabled to retry the Lambda functions it invokes. Step Functions states in the Guidance implement a retry and back-off mechanism. Additionally, AWS Software Development Kits (SDKs) used in the Lambda functions have default retry and back-off configuration. Amazon S3 Batch Operations also retries Lambda service-related errors.

Read the Reliability whitepaper
Performance Efficiency

Amazon S3 Batch Operations is designed to manage large-scale operations. Lambda functions automatically scale to handle the number of concurrent invocations. You can enable provisioned capacity for DynamoDB which will reserve sufficient system resources to meet your requirements.

Read the Performance Efficiency whitepaper
Cost Optimization

S3 Glacier provides multiple options for archive retrieval, including bulk retrieval, the lowest cost option that allows you to retrieve petabytes of data within 5-12 hours. S3 Glacier Flexible Retrieval provides free bulk retrieval for archived items that you’d want to retrieve infrequently, such as 1-2 times a year. Additionally, Amazon S3 Batch Operations allows you to manage billions of objects at scale without the need to provision costly and complex compute.

Read the Cost Optimization whitepaper
Sustainability

Amazon S3 Lifecycle rule is applied to the guidance S3 bucket to have objects expire after 180 days. The solution DynamoDB items are set to expire 60 days after restore and copy job completion. Automating expiration helps you avoid unnecessarily using storage resources for items that you no longer need.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Guidance for Automated Restore and Copy for Amazon S3 Glacier Objects

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer