Automate restore of archived objects through AWS Storage Gateway

UPDATE: This post was updated on 8/4/2021 to include an AWS CloudFormation template for automation and also to modify relevant sections to reflect support for Amazon S3 Glacier Deep Archive and S3 Intelligent-Tiering Archive Access and Deep Archive Access tiers.

AWS Customers who have on-premises file-based applications love having the ability to deploy an AWS Storage Gateway in File Gateway mode, giving them access to virtually unlimited storage in Amazon Simple Storage Service (Amazon S3). The contents of their Amazon S3 buckets are presented to them by File Gateway as files through commonly used storage protocols, such as SMB or NFS. Customers who require file-based access span across many different industries including healthcare (imaging and genomics), media and entertainment, and financial services.

Customers have told us that they want to be able to store and access their file data for many years at the lowest possible cost to satisfy their compliance and regulatory requirements, and often use Amazon S3 Lifecycle policies or Amazon S3 Intelligent-Tiering to automatically move their objects to lower-cost storage classes, such as Amazon S3 Glacier (S3 Glacier), Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive), and Amazon S3 Intelligent tiering Archive Access and Deep Archive Access tiers – See Figure 1.

Before setting up lifecycle policies, it is recommended that the differences in the two cold storage classes are well understood – especially as it pertains to storage costs, retrieval times and retrieval costs.

Amazon S3 Lifecycle moves objects upload to S3 to lower storage classes after set time

Figure 1: Objects uploaded to Amazon S3 are moved to S3 Glacier using Amazon S3 lifecycle policies

In the past, once their data had been archived to a lower cost storage class (S3 Glacier, S3 Glacier Deep Archive, or the S3 Intelligent tiering Archive and Deep Archive tiers), customers would need to manually restore the object back to Amazon S3 Standard or the Amazon S3 Intelligent tiering Frequent Access tier before they could access it through a File Gateway. We’re excited to say that it is now possible for customers to automate this workflow of restoring archived files to Amazon S3 by using Amazon CloudWatch and an AWS Lambda function to trigger a restore request. Below, take a look at a demo video that provides a quick walk through on automating restores of Amazon S3 Glacier objects through AWS Storage Gateway.

What happens when we use File Gateway to access archived files as discussed above?

Let’s walk through an example. ‘Joe’ is an administrator who works for the finance department in his organization and has been making great progress with migrating their less active content into Amazon S3. Joe knows he now has a central repository in the cloud which can be accessed by his users in single or multiple on-premises locations by launching a File Gateway and attaching it to his Amazon S3 buckets. Since his users are more likely to access recently created or updated files, they benefit from File Gateway’s local cache (of up to 16 TB/gateway). Similarly, when their applications write data to the File Gateway’s file share, the write back nature of the cache can give them up to 500 MiB/s throughput (See AWS Storage Gateway performance documentation for more details).

Joe’s finance department has a requirement that the data needs to be retained for 7 years and also be accessible to fulfill an auditor’s ad hoc requests. To cost effectively meet this retention requirement, Joe has set up a policy on his Amazon S3 bucket to move any objects that are older than 30 days into S3 Glacier Deep Archive. Since he knows the odds of his users regularly accessing data older than 30 days are very low, the trade-off of longer retrieval times and associated retrieval costs are acceptable to him.

Let’s now look at what happens when Joe’s users access the data via the File Gateway file share; we have three scenarios.

1. The object is already in the File Gateway disk cache

If the file accessed is available in the local File Gateway disk cache, (see Figure 2), the File Gateway immediately returns the data from the local cache at local disk speed. This is ideal for the user.

File is returned from local cache at local disk speed

Figure 2: File is returned from local cache at local disk speed

2. The object is in an Amazon S3 “online” storage class and had been evicted from the File Gateway cache

One of Joe’s users tries to access a file that is not in the local File Gateway cache, but the file is under 30 days old (remember, Joe has a 30-day lifecycle policy). His file is available from his default Amazon S3 storage class, which in this case is Amazon S3 Standard (see Figure 3).

The File Gateway will request the object that represents the file from Amazon S3 and download it to the File Gateway local disk cache and return it to the user. The file will then remain in the local cache until it is evicted to make space for newer data. For more details on how caching works, please refer to this AWS re:Invent 2018 deep dive video.

Accessing a file not in the local File Gateway cache but that is under 30 days old from Amazon S3 Standard

Figure 3: Accessing a file not in the local File Gateway cache, that is under 30 days old, from Amazon S3 Standard

3. The object is in an Amazon S3 cold storage class

One of Joe’s users now tries to access an older file. This time the file is in the S3 Glacier storage class which means it cannot be directly accessed without restoring it back to the S3 storage class (see Figure 4).

Up until recently, when the File Gateway went to access the S3 object that represents the file, and the object was not directly available, the process would fail and the user would get an IO error. Joe would recognize the cause of this error and manually go ahead and request the object to be “restored“ for his users. This is not the optimal user experience.

IO error when trying to access older file that is in the S3 Glacier storage class without restoring it back to the S3 storage class first

Figure 4: IO error when trying to access file in the S3 Glacier storage class before restoring it to the S3 storage class first

Here is an example of what this error looks like in the CloudWatch Logs:

{
    "severity": "ERROR",
    "bucket": "mybucket",
    "roleArn": "arn:aws:iam::123456789101:role/sts-test",
    "source": "share-E1B9B18A",
    "type": "InaccessibleStorageClass",
    "operation": "S3UploadFailure",
    "key": "myfile.txt",
    "gateway": "sgw-B8D938D1",
    "timestamp": "1565740862516"
}

So, how do we go about getting Joe and his users to easily access files that are in this storage class with minimal manual intervention to his workflow? Using Amazon CloudWatch Logs, File Gateway now notifies Joe when his end users access files archived to a cold storage class. This is exciting because Joe can build automation into the restoration workflow.

Based on the error message "type": "InaccessibleStorageClass", we can identify that the object that Joe is attempting to access: "key": "myfile.txt" in "bucket": "mybucket" has been moved to an “offline” storage class. Because this error is now being written to CloudWatch Logs, we can attach a Lambda function to the log stream to process this error message and perform an action.

Note: This error will only be generated if the object was already in the inaccessible storage class when the File Gateway was started up (or following a cache refresh), or if a lifecycle was used to transition it. If you manually transition the object through the console, it will generate a different error.

Now, I’m going to show how the AWS Lambda function can automatically initiate a recall request for the object. When the recall is complete, a message can be sent to Joe and his users via an Amazon Simple Notification Service (SNS), topic that they can subscribe to (see Figure 5). When the user tries to access the file again after the object has been restored, they will be able to access it through the File Gateway.

AWS users should review Amazon S3 Glacier Retrieval pricing to understand how their costs would be impacted by the automated retrieval of files. Frequent retrieval of files from Amazon S3 Glacier will result in increased costs which need to be weighed against the benefits of automation. Additionally, the differences in retrieval times between Amazon S3 Glacier and Amazon S3 Glacier Deep Archive should be reviewed and taken into consideration.

AWS Lambda function triggered by Amazon CloudWatch

Figure 5: AWS Lambda function triggered by Amazon CloudWatch

Step 1: Set up File Gateway CloudWatch Logs Group

If you don’t have a File Gateway already running, follow the steps here to launch a new one.
When you get to the Configure Logging step, click on Create New Log Group. This will take you to the CloudWatch Logs Console.
Click Actions → Create New Log Group.
Give the Log Group a name (e.g., myFGWLogGroup).
Go back to the File Gateway console and refresh the LogGroup dropdown.
Select your new Log Group and finish the rest of the File Gateway setup.

You can deploy the Lambda function and associated IAM role and code using an AWS CloudFormation template, and skip steps 2 and 3. if you want to deploy the template please download it using this link. Next, open the AWS Management Console, and from Services choose CloudFormation. Choose Stacks, then select the Create stack drop-down menu and select With new resources (standard). Upload the provided template and choose Next. Specify a valid integer for the RestoreDaysValue for example 2, and choose your desired Archive restore speeds for Glacier and GlacierDeepArchive. Finally, follow the guided deployment steps to start the the automatic creation of the resources.

Alternatively you can follow the next steps (steps 2 and 3) to manually deploy the resources.

Step 2: Create AWS Lambda execution role

We need to create an IAM role that our AWS Lambda function will assume when it runs. Our Lambda function will be calling the S3 API to request object recalls, so it will need these permissions on top of the usual Lambda Basic Execution Role.

Go to the AWS Identity and Access Management (IAM) console and select Roles.
Click Create role.
Choose Lambda and click Next: Permissions.
You can either choose one of the AWS managed policies here to test with, such as AmazonS3FullAccess, or you can click on the Create Policy option to create your own least privilege policy (best practice). If creating your own policy, you want to allow the “RestoreObject” and “GetObject” action against any relevant resources.
In addition to the above Amazon S3 policy, also attach the AWSLambdaBasicExecutionRole so that Lambda can operate as normal and generate a CloudWatch Logs stream.
Select your policy and click on Next: Tags.
Add any optional tags to the role, if none, click Next: Review.
Provide a role name and make a note of this for later (e.g. myLambdaS3RestoreRole), and click Create Role.

Step 3: Create AWS Lambda function

Next, we need to build our Lambda function that will initiate our object restores for us.

Go to the AWS Lambda Console.
Click Create function.
Select Author From Scratch.
Choose a name (e.g. FGWGlacierRestore).
Select the Runtime as Python 3.8.
Expand “choose an execution role” → “Use an existing role.” Choose the role you created in step 2.
Click Create function.
Scroll down to the Function Code window pane and replace the code in the editor with the following:

import os
import logging
import json
import boto3
import base64
import gzip
from botocore.exceptions import ClientError
import io

# Enable boto3 debug output
#boto3.set_stream_logger("")

# Set up logging
# logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(asctime)s: %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel('INFO')

# Create S3 Service Resource:
s3 = boto3.resource('s3')

# Define Environment Variables:
# Ensure Restore days value is integer and valid
try:
  restore_days = int(os.environ['RestoreDays'])
except ValueError:
    my_restore_days = 2

archive_restore_tier = str(os.environ['ArchiveRecallTier'])
deep_archive_restore_tier = str(os.environ['DeepArchiveRecallTier'])

# Ensure that the values set for Restore Tiers are valid and set a default value
if archive_restore_tier not in ('Expedited','Standard','Bulk'):
  archive_restore_tier = 'Expedited'
if deep_archive_restore_tier not in ('Standard', 'Bulk'):
  deep_archive_restore_tier = 'Standard'

def lambda_handler(event, context):
  cw_data = str(event['awslogs']['data'])
  # Unzip and Unpack the Data
  cw_logs = gzip.GzipFile(fileobj=io.BytesIO(base64.b64decode(cw_data))).read()
  log_events = json.loads(cw_logs)
  for log_entry in log_events['logEvents']:
    result = process_recall(log_entry)
    logger.info(result)
  return {
    'statusCode': 200,
    'body': result
    }

def process_recall(log_entry):
  logger.info("message contents: " + log_entry['message'])
  message_json = json.loads(log_entry['message'])
  if 'type' in message_json:
    logger.info("Found ErrorType")
    error_type = message_json['type']
    logger.info("ErrorType = " + error_type)
    if message_json['type'] != "InaccessibleStorageClass":
      return "Unexpected error: not related to storage class"
  else:
    return_error = "error: no type entry"
    return return_error

  if 'bucket' in message_json:
    logger.info("Found Bucket")
    s3_bucket = message_json['bucket']
    logger.info("Bucket = " + s3_bucket)
  else:
    return_error = "error: no bucket"
    return return_error

  if 'key' in message_json:
    logger.info("Found Key")
    s3_key = message_json['key']
    logger.info("Key = " + s3_key)
  else:
    return_error = "error: no key"
    return return_error

  # Create S3 Object Resource
  s3_object = s3.Object(s3_bucket, s3_key)
  try:
    obj_restore_status = s3_object.restore
    obj_storage_class = s3_object.storage_class
    obj_archive_status = s3_object.archive_status
    if obj_restore_status is None:
      logger.info('Submitting restoration request: %s' % s3_key)
      # Add support for S3-INT storage class, restore for this class does not include restore days
      # Use the user defined restore tier for Glacier and S3-INT Archive_access otherwise default to Standard
      if obj_storage_class == 'GLACIER':
        result = s3_object.restore_object(
        RestoreRequest={'Days': restore_days, 'GlacierJobParameters': {'Tier': archive_restore_tier}})
      elif obj_storage_class == 'DEEP_ARCHIVE':
        result = s3_object.restore_object(
        RestoreRequest={'Days': restore_days, 'GlacierJobParameters': {'Tier': deep_archive_restore_tier}})
      elif obj_storage_class == 'INTELLIGENT_TIERING' and obj_archive_status == 'ARCHIVE_ACCESS':
        result = s3_object.restore_object(
        RestoreRequest={'GlacierJobParameters': {'Tier': archive_restore_tier}})
      elif obj_storage_class == 'INTELLIGENT_TIERING' and obj_archive_status == 'DEEP_ARCHIVE_ACCESS':
        result = s3_object.restore_object(
        RestoreRequest={'GlacierJobParameters': {'Tier': deep_archive_restore_tier}})

    else:
      restore_message = "Restore request already submitted!"
      return restore_message
  except ClientError as e:
    return_error = "Unexpected Error whilst attempting to recall object"
    logger.error(e)
    return return_error
  return result

Scroll down to the Environment variables section to create 3 environment variables as per the following Key-Value pair:
- Key = ArchiveRecallTier | Value = “Expedited” or “Standard”
  - This value is used to specify the S3 Glacier or S3 Intelligent-Tiering Archive Access tier restore level.
- Key = DeepArchiveRecallTier | Value = “Standard” or “Bulk”
  - This value is used to specify the S3 Glacier Deep Archive or S3 Intelligent-Tiering Deep Archive Access tier restore level.
- Key = RestoreDays | Value = Integer
  - The value is used to define how long the restored object will be made temporarily available for in days (e.g. 1). This does not apply to S3 Intelligent-Tiering archive access tiers.
On the top of the screen click Save.

Step 4: Connect CloudWatch Logs to the Lambda function

Finally, we need to connect the CloudWatch Logs group to our Lambda function so that it can process our File Gateway logs.

In the console, open the Lambda function you manually created from step 3 or the one automatically created by the CloudFormation stack (the function name will include the stack name for easy identification).
In the Designer click Add trigger.
Under Trigger configuration choose CloudWatch Logs.
In the Log Group field, select the log group that you created in step 1 (e.g., myFGWLogGroup).
Add a Filter name (e.g., FGWLogsFilter).
In Filter pattern add: { $.type = "InaccessibleStorageClass" }.
Ensure that the Enable Trigger box is checked.
Click Add to continue.

Step 5: (Optional) Setup Amazon Simple Notification Service (SNS) for restore completion notification

If Joe wants to be notified when an object restore has completed, he can setup an SNS topic that he can subscribe to.

Part A: Create SNS topic

Open the Amazon Simple Notification Service console.
Go to Topics.
Click Create topic.
Enter a name for your SNS Topic (e.g., FGWRestoreTopic), then choose Create topic.
You will then be taken to your SNS topic configuration window, copy down the ARN of the SNS topic you created.
Click Edit in the top right-hand corner.
Expand the Access policy section.

Insert the following code snippet, a line above the the final “]”

Note: Change the values for <your-SNS-Topic-ARN>, <your-s3-bucket-name>, and <bucket-owner-account-id> with your own values.

{
   "Sid": "s3_example_statement_ID",
   "Effect": "Allow",
   "Principal": {
    "Service": "s3.amazonaws.com"  
   },
   "Action": [
    "SNS:Publish"
   ],
   "Resource": "<your-SNS-Topic-ARN>",
   "Condition": {
      "ArnLike": { "aws:SourceArn": "arn:aws:s3:*:*: <your-s3-bucket-name>" },
      "StringEquals": { "aws:SourceAccount": <bucket-owner-account-id> }
   }
  }

Part B: Subscribe to SNS topic

You will then be taken to your SNS topic configuration window. Click on Create Subscription.
In the Protocol field select Email.
In the Endpoint field, enter a valid email address where you want to receive the SNS notifications.
Click on Create Subscription when you are done.
Check your email inbox for the notification and be sure to accept the confirmation link in the email.
Go back to the Amazon S3 console.
Open the S3 bucket properties (the bucket which is mapped via your File Gateway).
Click on Events → Add notification.
Provide a name (e.g. FGWRestoreEvents), and select Restore completed.
From the Send to field select SNS topic as the destination, and in the SNS field select the topic you’ve just created, and click Save.

Step 6: Time to test

Try to access a file (through a File Gateway share) that you know is in the S3 Glacier storage class.
- Note: If the file is already in the local File Gateway cache, the file will be returned from cache and this new workflow will not execute.
You should receive an initial IO error.
Navigate to the CloudWatch console and select Logs from the left hand column.
Select the File Gateway log group you had previously created.
Under the Log Streams column you should see an entry similar to share-xyz123, which is your File Gateway file share ID. If you don’t see that entry, note that it can take up-to 5 minutes for the log group to receive the data from the File Gateway.
Once the log stream from your file gateway share is visible, click on it.
Click on the error message and look for the type:InaccessibleStorageClass – here you can also view the File (key) that you tried to access, along with the S3 bucket details.
Open a separate window to the Lambda Console, and check the Lambda log group for the function and check for a successful restore request (http 202 response).
If you have enabled SNS notifications, once the S3 Glacier restore has been completed you will get an email that contains the following in the body “eventName”:”ObjectRestore:Completed”. At this point you can access the file through File Gateway again.

Conclusion

The ability to leverage Lambda functions to help automate the restoration of files from S3 Glacier or S3 Glacier Deep Archive limits the need for manual intervention in the restore process. This capability, while taking some of the burden off of an administrator like Joe, does not solve the restoration issue for all use cases. Users need to understand their workflow so that they can determine if this process will be appropriate for their needs. There are some use cases where applications that are writing to and reading from a file share could experience timeouts if the restore process takes too long. The restore will eventually occur if the Lambda function has been set up but the application may not be able to wait long enough before timing out. Some applications can have their timeouts adjusted, but some cannot. In situations where an application is driving data in and out of the gateway, the user will need to test the functionality to verify if it will work appropriately.

As a reminder, users should consider retrieval and retrieval request costs for data that is restored from S3 Glacier or S3 Glacier Deep Archive. These storage classes are designed for long term archiving of data and are not recommended for data that will be frequently accessed. Users should carefully consider whether data should be stored in Amazon S3 Glacier or Amazon S3 Glacier Deep Archive, along with the cost ramifications of frequent requests to restore data from those storage classes.

You can learn more here about Amazon Storage Gateway in File Gateway Mode, Amazon Tape Gateway, Lambda functions, Simple Notification Services, Amazon S3 Glacier pricing, and AWS Billing and Cost Management