AWS Storage Blog

Processing file upload notifications from AWS Storage Gateway on Amazon S3

AWS Storage Day banner green squid

AWS customers often perform post-upload processing on groups of files transferred by AWS Storage Gateway from on premises to Amazon S3. Before now, they have been unable to reliably initiate this downstream processing based on individual file upload events. Today, we are launching a new AWS Storage Gateway feature, for File Gateway, which enables customers to configure continuous notifications for file upload events. This enables the creation of event-driven pipelines to power serverless Amazon S3 object processing scenarios. File upload notifications are available on new File Gateways starting today in every AWS Region where AWS Storage Gateway is available. For existing File Gateways, it will be made available during the next scheduled software update.

In this blog, we illustrate and walk through an example processing workflow enabled by this new File Gateway feature.

AWS Storage Gateway notification types

First, we summarize the two upload notification types now available for File Gateway:

  • File upload notification (newly released): An event is delivered whenever a file has been successfully uploaded to Amazon S3. The event contains useful information for each file upload.
  • Working file set upload notification (existing): An event is delivered when all files in the File Gateway cache, up to the time you made a request for notification, have been successfully uploaded to Amazon S3. The event only contains a confirmation identifier, not individual file upload details.

Each notification type can be used to drive processing workflows of files uploaded to Amazon S3 for a variety of use cases. Working file set notifications are a good way to signal the completion of upload activity, when the content of the File Gateway cache can be treated as a single set of data. Alternatively, file upload notifications are useful when multiple clients are writing to the AWS Storage Gateway and a customer would like to initiate separate processing workflows based on groups of specific files. You can trigger working file set notifications by calling the NotifyWhenUploaded API. Conversely, file upload notifications provide an enhanced capability since they are continuously delivered for each file uploaded to Amazon S3. This not only provides for better notification granularity, it also allows for S3 object processing logic to be decoupled from the File Gateway.

A note on Amazon S3 event notifications

While Amazon S3 event notifications are a great feature for many use cases, we do not recommend using them to notify you of file uploads to Amazon S3 via a File Gateway. When a File Gateway is required to prioritize cache usage, partial file uploads may temporarily occur to Amazon S3. While the File Gateway eventually fully uploads the files as part of this process, Amazon S3 event notifications still triggers upload notifications in the interim. Since these notifications refer to partially uploaded files, they cannot be relied upon to trigger downstream processing. In these scenarios, using notifications generated by the File Gateway itself is a more robust and reliable mechanism.

Overview of an example S3 object processing solution

Customers often use File Gateway to write multiple individual files that are part of a larger backup or vaulting operation. Using the new file upload notification feature, customers can use a number of AWS services to create a scalable workflow. This enables customers to group together individual files for downstream operations, such as archive creation or Amazon S3 object post processing.

The following diagram depicts an example solution:

Example S3 object processing solution

Processing workflow overview

You perform the following steps as part of this processing workflow, as numbered in the preceding solution architecture diagram:

1. On-premises clients write multiple files to a File Gateway file share. Each file is uniquely stored, either by using a standard file name hashing mechanism, or by writing all files in a particular operation to a unique directory name on the File Gateway. The jobs also write a “manifest” file that contains a list of all “data” files for that operation. The processing workflow collates these “data” files together as a logical set for downstream processing. Multiple processing workflows can be executed in parallel as different “manifest” files are uploaded.

2. The File Gateway delivers file upload notifications to Amazon EventBridge when each file has been successfully uploaded to Amazon S3. These events are received by the default event bus.

3. EventBridge triggers a rule for each File Gateway file upload notification event. This delivers the entire event payload as a message to Amazon SQS. Using Amazon SQS rapidly moves events off the default EventBridge event bus. This durably stores the events in large numbers, and provides the workflow an ability to absorb any backpressure caused by delays in downstream processing steps.

4. An AWS Lambda function reads and processes messages from the Amazon SQS queue. The function parses the event information, enriches it with details of the event type, and sends this to an EventBridge custom event bus. The function’s main responsibility is to identify if the file upload notification pertains to a “manifest” or “data” file event, by matching relevant parts of the Amazon S3 object key name.

5. An EventBridge rule is triggered when a “manifest” file upload event has been delivered to the associated custom event bus. As we mentioned in step 1, a “manifest” file contains a list of all “data” files that correspond to a single logical set. The EventBridge target for this rule is an AWS Step Functions state machine. It executes a series of iterating steps that reconcile the contents of the “manifest” file, that is read from Amazon S3, with the contents of an Amazon DynamoDB table that is continuously updated in the following step.

6. A separate EventBridge rule is triggered when a “data” file upload event has been delivered. The EventBridge target for this rule is a Lambda function that writes selected information about the uploaded Amazon S3 object to a DynamoDB table. This aspect of the workflow provides a persistence layer for the file upload notifications. This permits for a fully managed and scalable backend to store metadata for large logical file sets formed from constituent “data” files that may take extended periods to upload.

7. When the AWS Step Functions state machine has reconciled the contents of a “manifest” with the relevant items in the DynamoDB table, the workflow has successfully collated a complete logical set of data files uploaded to Amazon S3. The state machine communicates this by emitting a completion event to another EventBridge custom event bus.

Tutorial of key steps

Now, let’s dive deeper on how to configure key aspects of the preceding solution. We summarize the functional role of AWS Lambda, AWS Step Functions, and Amazon DynamoDB, since providing specifics on these components is beyond the scope of this post.

Create a File Gateway and file share (step 1)

Before attempting the following steps, you must create a File Gateway and file share.

Configure AWS Storage Gateway file upload notifications (step 2)

File upload notifications are configured on a per file share basis. For newly created file shares, this is an option that can be enabled at creation time. For existing file shares, you can enable notifications by accessing the AWS Storage Gateway console and then by completing the following steps:

  1. Navigate to File shares and select the relevant share.
  2. Click the Actions dropdown menu and select Edit share settings.
  3. Select the box for File upload notification and specify a Settling time in seconds. This figure determines the file activity time window, within which, upload notifications are not sent for multiple changes to a file. Depending on the use case, this is a useful option to prevent notification storms.
  4. Click Save.

Configure AWS Storage Gateway file upload notifications

You can also configure file upload notifications using the AWS Storage Gateway API or AWS CLI by specifying the NotificationPolicy or --notification-policy request parameter respectively, as part of file share creation or update operations. For NFS file shares, you can do this via the CreateNFSFileShare and UpdateNFSFileShare APIs or --create-nfs-file-share and --update-nfs-file-share CLI commands.

Create an Amazon EventBridge rule for file upload notifications (step 3)

Once configured for a file share, File Gateway delivers file upload notifications to the EventBridge default event bus. Notifications are structured according to the following event pattern:

{
  "version" : "2012-10-17",
  "id" : "[ID]",
  "detail-type" : "Storage Gateway Object Upload Event",
  "source" : "aws.storagegateway",
  "account" : "[ACCOUNT ID]",
  "time" : "[YYYY-MM-DDTHH:MM:SSZ]",
  "region" : "[REGION]",
  "resources" : [
    "arn:aws:storagegateway:[REGION]:[ACCOUNT ID]:share/[SHARE NAME]",
    "arn:aws:storagegateway:[REGION]:[ACCOUNT ID]:gateway/[GATEWAY NAME]"
    ],
  "detail" : {
    "event-type" :"object-upload-complete",
    "bucket-name": "[BUCKET NAME]",
    "modification-time": "[YYYY-MM-DDTHH:MM:SSZ]",
    "object-key": "[OBJECT KEY NAME]",
    "object-size": [SIZE BYTES],
    "prefix": "[PREFIX]/" 
 }
}

Once you have created an Amazon SQS queue to receive these events from EventBridge, create an EventBridge rule that delivers these notifications into the queue. Open the EventBridge console and complete the following steps:

  1. Navigate to Rules.
  2. Click Create rule and enter a name and description for the rule.
  3. In the Define pattern box, click the Event pattern radio button and select the Custom pattern radio button. At launch, file upload notifications can be matched by using a custom pattern. Soon, you will be able to use the Pre-defined pattern by service radio button to pre-populate the event pattern box. For now, enter the following custom pattern in the Event pattern box and click Save:
{
  "source": [
    "aws.storagegateway"
  ]
  "detail-type": [
    "Storage Gateway Object Upload Event"
  ]
}
  1. In the Select targets box, select the Amazon SQS queue you created previously, and under Configure input click the Matched events radio button. This configures the rule to pass the entire event payload to the SQS queue:

In the Select targets box, select the Amazon SQS queue you created previously, and under Configure input click the Matched events radio button

  1. Click Add target and scroll to the bottom of the page to click Create.

Ensure that the target Amazon SQS queue access policy allows for events.amazonaws.com to perform the sqs:SendMessage action. You can use a condition check in the access policy to restrict this action to the ARN of the EventBridge rule just created.

Configure an Amazon SQS queue with an AWS Lambda function trigger (step 4)

You can configure an AWS Lambda function as a trigger on an Amazon SQS queue. This allows for our file upload notification messages to be processed for onward delivery in a scalable manner, in addition to providing options to introduce more advanced event processing logic. In this example solution, we are using an AWS Lambda function to filter for Amazon S3 object key name suffixes and create specific event payloads for delivery to an EventBridge custom event bus. This, in turn, allows for the triggering of specific EventBridge rules to branch processing logic in further steps.

EventBridge will send a file upload notification event to the Amazon SQS queue and trigger the AWS Lambda function. The full event payload can be referenced within the function itself. The function filters the Amazon S3 object key name to differentiate between “data” and “manifest” file types. This filtering is dependent on implementing a standardized “manifest” file-naming scheme by clients writing to the File Gateway on premises, such as adding a “.manifest” suffix. After identifying the file type, the function creates a JSON payload for a custom event and sends this to an EventBridge custom event bus.

To configure an AWS Lambda function trigger on the Amazon SQS queue created above, open the Amazon SQS console and complete the following steps:

  1. Click Queues and the relevant queue name.
  2. Click on the Lambda triggers tab and Configure Lambda function trigger.
  3. Choose the relevant function from the list and click Save. Ensure that the Lambda function is configured with an IAM execution role that permits the sqs:ReceiveMessage action on this SQS queue. It should also permit the events:PutEvents action for the EventBridge custom event bus created in the next step.

Choose the relevant function from the list and click Save.

Create an Amazon EventBridge custom event bus and rules for customer event patterns (steps 5 and 6)

The solution uses an EventBridge custom event bus and associated rules to provide greater flexibility and control over onward Amazon S3 object processing logic. The rules handle different custom event types to trigger the appropriate downstream processing step for a given Amazon S3 object type. To create an EventBridge custom event bus, open the EventBridge console and complete the following steps:

  1. Click Event buses.
  2. In the Custom event bus box, click Create event bus.
  3. Enter a name for your event bus and select the appropriate boxes to configure the required access permissions.
  4. Click Create.

Create an Amazon EventBridge custom event bus and rules for customer event patterns

We can now create custom rules for this event bus in the same way we created a rule for the default event bus in step 3. This time, we specify custom event patterns to match against. As per the solution diagram, we create two different rules that match the appropriate file upload type:

  1. Click on Rules in the left panel of the EventBridge console.
  2. Click Create rule and enter a name and description for the rule.
  3. In the Define pattern box, click the Event pattern radio button and select the Custom pattern radio button.
  4. Notice that the Event pattern box to the right is empty. For events delivered by AWS services, this box is pre-populated. For custom events, you can define a pattern here that corresponds with the custom events delivered to this event bus. Once entered, click Save. The following is an example for a “manifest” file upload notification:

The following is an example for a “manifest” file upload notification

  1. In the Select event bus box, click the Custom or partner event bus radio button and click on the custom event bus you created:

In the Select event bus box, click the Custom or partner event bus radio button and click on the custom event bus you created

  1. In the Select targets box, select a Step Functions state machine you created, to poll a DynamoDB table and reconcile against the “manifest” file read from Amazon S3. Under Configure input click the Matched events radio button. This allows the rule to pass the entire event payload to the Step Functions state machine.
  2. Click Add target and scroll to the bottom of the page to click Create.

Repeat these steps to add a rule to match against “data” file upload events, by modifying the custom event pattern accordingly. The target for this rule should be a Lambda function you have created to write chosen values from the file upload notification into a DynamoDB table.

A note on AWS Step Functions and Amazon DynamoDB

While we haven’t focused on Step Functions and DynamoDB, the following are a few recommendations for these service components:

  • The Step Functions state machine should be built according to the job status poller pattern. The “manifest” file can be read once, from Amazon S3, when the state machine is invoked and passed between iterations as state machine input/output. The poller pattern provides for an iterator state that should be bounded by a sensibly sized iteration ticker or a time window, emitting an error event if either of these are breached.
  • File upload notification events loaded into the DynamoDB table can have a configured TTL attribute to allow for automatic archiving after an appropriate period of time. This blog post describes a potential approach for this. A good primary key for this use case is to specify the unique logical file set hash as an attribute for the partition key and the S3 object key name as an attribute for the sort key. This will optimally return a single correlated set of S3 objects when the table is queried by the file set hash.

Cleaning Up

To avoid incurring future charges, delete the File Gateway you created. Additionally, clean up resources in your account by removing the EventBridge custom event bus, SQS queue, Lambda functions, Step Functions state machine, and DynamoDB table.

Conclusion

In this post, we discussed how the newly released AWS Storage Gateway file upload notification feature for File Gateway can be used to enable post-upload processing of groups of files, transferred from on premises to Amazon S3. By using this feature, in combination with other AWS services, customers can now reliably initiate downstream processing based on individual file upload events, where this was not possible before using existing File Gateway notification mechanisms or S3 events. We encourage customers to extend the patterns discussed in this post to build other processing scenarios. Thanks for reading our blog and learning more about this new feature! If you have any comments or questions, please leave them in the comments section.

To learn more about the services mentioned in this post, refer to these product pages: AWS Storage Gateway, Amazon S3, Amazon EventBridge, Amazon SQS, AWS Lambda, AWS Step Functions, and Amazon DynamoDB. Also, remember to join us on November 10, 2020, for the AWS Storage Day virtual event, to learn what is new across the AWS Storage portfolio.

Atiek Arian

Atiek Arian

Atiek is a Global Solutions Architect at Amazon Web Services. He works with some of the largest AWS Financial Services customers in the world, assisting them in their adoption of AWS Services. In his free time, Atiek enjoys spending time with this family, watching Formula One, and reading.

Dominic Searle

Dominic Searle

Dominic is a Solutions Architect at Amazon Web Services. He works with AWS Financial Services customers providing technical guidance and assistance to help them make the best use of AWS Services. Outside of work, he is either spending time with his family, diving into another hobby, including his latest automated beer brewing system, or learning to play the guitar…. Badly.