This Guidance provides an automated workflow to restore archived Amazon Simple Storage Service (Amazon S3) data stored in S3 Glacier Flexible Retrieval and Deep Archive storage classes to a new storage class. It uses an in-place copy and copies to a new prefix within the same S3 bucket or to another S3 bucket in the same or different AWS account or Region. The Guidance orchestrates the steps involved in archive restore, including S3 bucket inventory generation, manifest query and optimization, archive retrieval, and the copy process. It also tracks the progress to send job status notification.
This Guidance consists of two architectures. The first architecture is an overview that shows you how to request restoration of archived items listed in a CSV manifest. The second architecture shows how to automate the creation of a CSV manifest using an AWS Step Functions workflow.
Please note: [Disclaimer]
Architecture Diagram
[Architecture diagram description]
-
Overview
-
Automated CSV Manifest Generator
-
Overview
-
This architecture shows how to request restoration of archived items listed in a CSV manifest. For details on how to automate the creation of the CSV manifest, open the Automated CSV Manifest Generator tab.
Step 1
Upload a list of Amazon Simple Storage Service (Amazon S3) archived objects that you want restored to a destination bucket to the solution S3 bucket using a CSV manifest.Step 2
This upload invokes the Restore Worker AWS Lambda function.Step 3
The Restore Worker Lambda function submits a Restore Operation job to Amazon S3 Batch Operations.
Step 4
Amazon S3 Batch Operations initiates object restore in the S3 bucket containing the archived objects.
Step 5
The Amazon S3 Batch Operations completion report invokes the Job Tracker Worker Lambda function.
Step 6
The Job Tracker Worker Lambda function creates an entry in the Amazon DynamoDB table with the restore details.
Step 7
Amazon EventBridge invokes the Job Scheduler Worker Lambda function on a schedule.Step 8
The Job Scheduler Worker Lambda function queries the state table to determine the status of each restore job and when it is eligible for a copy job.Step 9
The Job Scheduler Worker Lambda function invokes the Copy Worker Lambda function to submit Amazon S3 Batch Operations and invoke the Lambda function job.Step 10
Amazon S3 Batch Operations invokes a Lambda function that performs the actual copy operation to the destination bucket.Step 11
The Amazon S3 Batch Operations report invokes the Job Tracker Worker Lambda function to send an Amazon Simple Notification Service (Amazon SNS) message to the user stating that the S3 object is available. -
Automated CSV Manifest Generator
-
This architecture shows how to automate the creation of a CSV manifest using an AWS Step Functions workflow. For an overview of the full architecture, open the Overview tab.
Step 1
Amazon S3 delivers the inventory report into the solution S3 bucket which invokes a Lambda function and starts the workflow.Step 2
Amazon Athena and AWS Glue queries the Amazon S3 inventory to determine the number of archived objects to be restored.Step 3
Process chunks, optimizes and generates CSV manifests from the Amazon S3 inventory using Athena.
Step 4
The workflow removes inventory configuration on the S3 archive bucket, and for each CSV manifest chunk, triggers the Restore Worker Lambda function in the solution core. The workflow then sends an email notification for each submitted restore job.
Step 5
The AWS Step Functions workflow completes successfully and sends an email to notify the user. Track restore and copy jobs using the Amazon S3 Batch Operations Section in the AWS Management Console.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance can be re-deployed using the AWS CloudFormation template. The solution includes an SNS notification function that provides job status and failures.
-
Security
Data stored in DynamoDB and Amazon S3 are protected by default through AWS encryption. By default, S3 buckets have access control lists (ACLs) disabled and public block access enabled.
-
Reliability
Step Functions has retry and exponential back-off enabled to retry the Lambda functions it invokes. Step Functions states in the Guidance implement a retry and back-off mechanism. Additionally, AWS Software Development Kits (SDKs) used in the Lambda functions have default retry and back-off configuration. Amazon S3 Batch Operations also retries Lambda service-related errors.
-
Performance Efficiency
Amazon S3 Batch Operations is designed to manage large-scale operations. Lambda functions automatically scale to handle the number of concurrent invocations. You can enable provisioned capacity for DynamoDB which will reserve sufficient system resources to meet your requirements.
-
Cost Optimization
S3 Glacier provides multiple options for archive retrieval, including bulk retrieval, the lowest cost option that allows you to retrieve petabytes of data within 5-12 hours. S3 Glacier Flexible Retrieval provides free bulk retrieval for archived items that you’d want to retrieve infrequently, such as 1-2 times a year. Additionally, Amazon S3 Batch Operations allows you to manage billions of objects at scale without the need to provision costly and complex compute.
-
Sustainability
Amazon S3 Lifecycle rule is applied to the guidance S3 bucket to have objects expire after 180 days. The solution DynamoDB items are set to expire 60 days after restore and copy job completion. Automating expiration helps you avoid unnecessarily using storage resources for items that you no longer need.
Implementation Resources
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.