Automate file retrieval from S3 Glacier when using FSx for Lustre

Customers are running modern high performance computing (HPC) workloads on AWS to process large datasets cost-effectively and optimize performance. These workloads typically rely on a high-performance file system to provide low latency and high throughput access to the data. A performant file system is important so that the file storage can keep up with the speed of compute. Amazon FSx for Lustre is a fully managed storage service providing a high performance, scale-out file system. FSx for Lustre is suitable for machine learning, high performance computing, video rendering, and many other compute-intensive workloads.

Many AWS customers link their FSx for Lustre file systems to Amazon S3 buckets. Performing this action allows for data to be accessed and processed concurrently from both a high-performance file system and from the S3 API. When linked to an S3 bucket, FSx for Lustre transparently presents objects as files. Presenting objects as files allows you to run your workload without managing data transfer from S3. As the contents of your S3 bucket change, FSx for Lustre automatically updates your file system with the latest data available to run your workload.

To keep S3 storage costs low, customers may move older and less frequently accessed S3 objects to Amazon S3 Glacier and S3 Glacier Deep Archive. S3 Glacier and S3 Glacier Deep Archive are secure, durable, and low-cost Amazon S3 storage classes for data archiving and long-term backup. However, S3 objects that have been moved to S3 Glacier or S3 Glacier Deep Archive can no longer be accessed using Amazon FSx. These S3 objects must be restored from S3 Glacier or S3 Glacier Deep Archive by a cloud storage administrator to again be accessible via Amazon FSx. In this blog post, I cover setting up automatic file retrieval from S3 Glacier when an FSx for Lustre file system must load files from S3.

Solution overview

When FSx for Lustre is integrated with S3, all file metadata is available in the FSx for Lustre file system. File data is lazy loaded from S3 as these files are accessed via Amazon FSx. In some cases, the file data is not available in the file system, and the S3 object corresponding to that file has been moved to S3 Glacier. In these cases, an attempt to access this file from Amazon FSx will result in an error. To make that file available using FSx for Lustre, the file first must be retrieved from S3 Glacier and put into the S3 Standard storage class. This solution will automate this retrieval process through the use of Amazon EventBridge and the AWS Lambda function. Here is the architectural diagram that demonstrates the event flow and interaction of the components of this solution.

Architectural diagram demonstrating event flow and interaction of components

Here is the event flow:

User tries to access a file in the FSx for Lustre file system.
FSx for Lustre only has file metadata, but not its contents. Therefore, FSx for Lustre issues an S3 GetObject API call to lazy load this file from S3.
FSx for Lustre does not find this file in S3 since it has already been archived to S3 Glacier. S3 responds to the GetObject request with errorCode of InvalidObjectState.
AWS CloudTrail is configured to notify Amazon EventBridge when it encounters an S3 GetObject response with errorCode of InvalidObjectState.
EventBridge triggers a Lambda function that initiates the retrieval of this file from S3 Glacier with an S3 RestoreObject API call.
The file becomes available in S3 and can be accessed by users via FSx for Lustre.

Configuration walkthrough

1. Configure AWS CloudTrail with Amazon CloudWatch Logs enabled

We must make sure that S3 GetObject calls are logged in Amazon CloudWatch Logs. These calls are sent to EventBridge, which in turn, initiates the Lambda function. The first step is to create a CloudTrail trail with CloudWatch Logs enabled. Performing this action will monitor S3 GetObject calls on the S3 bucket used by FSx for Lustre.

Navigate to the CloudTrail console. Next, select the Create trail button. On the following screen titled Step 1 Choose trail attributes, fill out the Trail name field. Then, select either Create new S3 bucket or Use existing S3 bucket for the storage location. Make sure that you check the CloudWatch Logs – Enabled parameter on this page. Checking this parameter allows this trail to send log files to Amazon CloudWatch Logs. Choose New for Log group and IAM Role parameters.

Configure AWS CloudTrail with Amazon CloudWatch logs enabled figure 1

On the next screen titled Step 2 Choose log events, use the following configuration. First, select the Data events check box, and then select the Switch to advanced event selectors button. Next, for Data event type, choose S3 and for Log selector template, choose Custom. Then build your advanced event selectors as demonstrated by the following screenshot.

Configure AWS CloudTrail with Amazon CloudWatch logs enabled figure 2

Make sure you specify your bucket name in the resources.ARN field value. Use the unique S3 bucket name of the bucket used by your FSx for Lustre file system. Your unique S3 bucket name should be used instead of the name fsx4l-store, which is provided as an example. You can copy the S3 bucket Amazon Resource Name (ARN) from your bucket properties page or use the Browse button to find it. Your FSx data may be organized in a specific folder (aka prefix) inside your S3 bucket. Also, you may want to limit automatic restore from S3 Glacier to only a subset of your data located in a specific folder. Limiting automatic restore to data in a specific folder can be achieved by adding this folder to the resources.ARN field. For example, use arn:aws:s3:::<your-bucket-name>/<fsx-data-prefix>/ as the value for the resources.ARN field.

2. Create IAM Role for Lambda function execution

Next, we need to create an IAM Role that will allow our Lambda function to execute and make
S3 RestoreObject calls for objects in your S3 bucket.

Go to the IAM console and create an IAM policy with the following JSON definition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:RestoreObject",
            "Resource": "arn:aws:s3:::your-bucket-name-here/*"
        }
    ]
}

Then go to IAM Roles and select the Create role button. In the Choose a use case section, select Lambda. Then, select the Next:Permissions button.

Using the search bar, find and select the IAM policy that you created. Then, find and select an AWS managed IAM policy called AWSLambdaBasicExecutionRole.

Select the Next: Tags button, and optionally add tags. Then select the Next: Review button and type in a Role name and a Role description on the next screen. Make sure that the two IAM policies selected earlier are displayed in the Policies section of the Review screen. Then, select the Create Role button to finish.

3. Create Lambda function

In this step, you will create a Lambda function that initiates an ObjectRestore call to S3 Glacier. The Lambda function will trigger when an S3 GetObject calls to the S3 bucket linked to FSx for Lustre results in an InvalidObjectState error.

Now, navigate to the Lambda console and select the Create function button. Enter a Function name and select Python 3.8 as the Runtime. Expand the Change default execution role option under Permissions and select Use an existing role. Choose the IAM role that you created in the previous step from the drop-down menu. Then, select Create function, which will take you to the Code source section.

Select lambda_function.py and add the following Python 3.8 code:

import boto3
import botocore
import logging
import os
 
restored_days = int(os.environ['RESTORED_OBJECT_AVAILABILITY_DAYS'])
restore_tier = os.environ['RESTORE_TIER']
 
logger = logging.getLogger()
logger.setLevel(logging.INFO)
 
def lambda_handler(event, context):
    logger.info("new Event: " + str(event))
    if event['detail']['errorCode'] == 'InvalidObjectState':
        s3 = boto3.client('s3')
        response = s3.restore_object(
            Bucket=event['detail']['requestParameters']['bucketName'],
            Key=event['detail']['requestParameters']['key'],
            RestoreRequest={
                'Days': restored_days,
                'GlacierJobParameters': {
                    'Tier': restore_tier,
                },
            },
        )
        print(response)

Make sure you select the Deploy button after adding your Lambda function Python code.

Go to the Configuration tab and then the Environment variables section, and select the Edit button. Next, use the Add environment variable button to add two key pairs to the configuration of the Lambda function as demonstrated below.

Create Lambda function
You can choose the desired number of days that the restored object will be available. You can also choose the restore tier (Expedited, Standard, and Bulk). In this example, the following values are used:

RESTORED_OBJECT_AVAILABILITY_DAYS 1
RESTORE_TIER Expedited

More details regarding various restore tiers are available in the S3 documentation.

4. Create an Amazon EventBridge rule

Navigate to the Amazon EventBridge console and select Create rule. Then give your rule a name and choose Event pattern. Next, select Custom pattern. Paste the following JSON structure in the Event pattern window, and select Save.

{
  "source": ["aws.s3"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventName": ["GetObject"],
    "errorCode": ["InvalidObjectState"]
  }
}

In the Select targets section under Target, select Lambda function. In the Function drop-down menu, select the Lambda function that you created in the previous step.

5. Optionally configure Amazon Simple Notification Service (Amazon SNS) notifications

Optionally, you may also create an Amazon SNS topic and configure your S3 bucket to send notifications to that topic when the restore process is initiated and/or completed.

Navigate to the Simple Notification Service console and select Topics and then select the Create topic button. Choose Type – Standard and give your topic a name. Choose Access policy, then Advanced, and paste the JSON template below, replacing the bolded sections with your values.

{
 "Version": "2012-10-17",
 "Id": "example-ID",
 "Statement": [
  {
   "Sid": "example-statement-ID",
   "Effect": "Allow",
   "Principal": {
     "Service": "s3.amazonaws.com"
   },
   "Action": [
    "SNS:Publish"
   ],
   "Resource": "arn:aws:sns:Region:account-id:topic-name",
   "Condition": {
      "ArnLike": { "aws:SourceArn": "arn:aws:s3:::awsexamplebucket1" },
      "StringEquals": { "aws:SourceAccount": "bucket-owner-account-id" }
   }
  }
 ]
}

Once your topic is created, select Create subscription. Choose your endpoint type in the Protocol section (for example: email alias, phone number for SMS messages, URL for delivery of JSON-encoded messages via HTTP(S) POST, etc.).

Next, navigate to the S3 console. Then, choose your bucket and go to the Properties tab. Navigate down to the Event notifications section and select the Create event notification button. Enter an event name and fill out the optional parameters, like prefix and suffix, if you want to limit the scope of notifications. Then in the Event types section, select Restore completed and/or Restore initiated as necessary. In the Destination section, select SNS Topic and Choose from your SNS topics. Finally, select your recently created SNS topic in the drop-down menu, and select Save changes.

Cleaning up

To clean up your AWS account and avoid incurring unintended charges, make sure you delete the EventBridge rule and the CloudTrail trail that you used to trigger the Lambda function. Also, it is a best practice to remove all other unused resources, even though they are not expected to generate charges on their own. Therefore, you can proceed to delete the Lambda function, the IAM Role, the IAM Policy, and the Event notification configuration of the S3 bucket. You can also delete the SNS topic that you created earlier.

Conclusion

In this blog post, I demonstrated setting up automatic retrieval of files from the S3 Glacier and S3 Glacier Deep Archive storage classes for an FSx for Lustre file system linked to S3. This solution can free up storage administrators and users from retrieving files manually and give time back to work on other innovative projects.

Since retrieving files from S3 Glacier and S3 Glacier Deep Archive has a cost to it, consider building your lifecycle policy so that only data that is not expected to be frequently accessed is moved to S3 Glacier and S3 Glacier Deep Archive. If your data access patterns are not predictable, consider using S3 Intelligent-Tiering. S3 Intelligent-Tiering is designed to optimize costs by automatically moving data to the most cost-effective access tier.

You may also consider moving your entire dataset into the FSx for Lustre file system and achieving cost efficiency by enabling the Lustre data compression feature. To keep up to date with the benefits, use cases, and features of Amazon FSx for Lustre, visit the Amazon FSx for Lustre product page.

Thanks for reading this blog post! If you have any questions or suggestions, please leave your feedback in the comments section.