Manage event ordering and duplicate events with Amazon S3 Event Notifications

Customers use data events to build applications to trigger and communicate between decoupled services. An event is a change in state of, or an update to, data. For certain applications, such as batch order processing or content management, customers may need to implement application logic to handle duplicate and out-of-order events. For these use cases, receiving duplicate events can erroneously trigger workflows multiple times, and receiving the events out of order can trigger workflows that process old events. This can result in processing orders multiple times or in the wrong order, and both of these scenarios can cause undesirable outcomes, such as incurring additional compute costs or requiring manual intervention.

You can use Amazon S3 Event Notifications to receive notifications when changes happen in your Amazon S3 bucket. For example, you can receive notifications when objects stored in Amazon S3 are created, removed, restored, or replicated. With services like Amazon EventBridge, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), and AWS Lambda, S3 Event Notifications enable you to build scalable event-driven applications.

In this post, we walk you through how to avoid processing duplicate and old S3 Event Notifications using an architecture built with Amazon S3, Amazon EventBridge, AWS Lambda, and Amazon DynamoDB. In our example, we provide sample code that you can use in your application that checks the state of each event before processing it. This will help you prevent workflow errors and avoid manual intervention or additional processing costs.

Identifying duplicate Amazon S3 Event Notifications

Amazon S3 Event Notifications is designed to deliver notifications at least once, but they aren’t guaranteed to arrive in the same order that the events occurred. On rare occasions, Amazon S3’s retry mechanism might cause duplicate S3 Event Notifications for the same object event. When this happens, you can use the sequencer in the S3 Event Notification JSON object to identify the event sequence for the same object. Specifically, duplicate S3 Event Notifications for an object event have the same value for the object key, versionId, operation, and sequencer. If you compare the sequencer strings from two S3 Event Notifications on the same object key, the S3 Event Notification with the greater sequencer hexadecimal value is the event that occurred later. Note that if you copy the same object to two different object keys, you can’t compare the sequencer values because they are different objects.

Imagine a use case where objects (a document or an image) are uploaded within a short period of time and processing is initiated for those objects. You can use S3 Event Notifications to trigger an AWS Lambda function, either directly from Amazon S3 or via Amazon EventBridge. However, receiving duplicate S3 Event Notifications can invoke your function multiple times. There is also the possibility that while your application is already processing an object, it could receive a newer version of the object. And, if the application processing the older object is slower than the new one, the processing function might overwrite the newer result with the older one. Another possible scenario is that different types of S3 Event Notifications, like ObjectRemoved or ObjectCreated, could be processed out of order. In these cases, you may want to avoid overwriting the output of a more recent operation.

To simulate this use case, we walk through how to deploy an AWS SAM application that contains a sample notification handler function for object creation events. This function supports ignoring duplicate and older S3 Event Notifications, and can inject latency into its processing with a probability variable to simulate a processing delay.

Solution overview and prerequisites

The following prerequisites are required to follow along with this post:

AWS SAM CLI, version 1.74.0 or higher. Refer to the documentation for deploying AWS SAM CLI.
AWS Command Line Interface (AWS CLI) 2.7.33 or higher, configured on the client to deploy SAM application and run test uploads.
Python 3.8 or higher.

The architecture of the example solution deployed is shown in Figure 1. When an image is uploaded into Amazon S3, it sends an S3 Event Notification to AWS Lambda via Amazon EventBridge. Then, the function adds an item to a DynamoDB table, along with state information of the event such as the sequencer. This item contains the newest sequencer received, and is not updated if we receive an S3 Event Notification with an older sequencer. The application then transforms the image by inverting its colors if the notification was not received out of order or duplicated. Once transformed, it uploads the result to an output bucket.

Figure 1 - Architecture of the SAM application deployed

Figure 1: Architecture of the SAM application deployed

Solution walkthrough

In this section, we walk through:

Deploying the example architecture
Application testing
Using Amazon DynamoDB to track state

1. Deploying the example architecture

First, clone the repository from the GitHub repository.

git clone https://github.com/aws-samples/amazon-s3-endedupe.git

Change to the root directory of the project and run the following AWS SAM CLI commands. Accept all the defaults when deploying.

cd amazon-s3-endedupe
sam build
sam deploy --guided --stack-name eventbridge-blog

Output of SAM deployment

Figure 2: SAM deployment steps using the default options

Once the deployment is complete, you will see resources created like input and output bucket names as shown in Figure 3.

Output of SAM deployment

Figure 3: Output of SAM deployment

3. Go to the AWS Management Console and select the “CloudFormation” service with the region where the SAM application was deployed. Go to the deployed stack name “eventbridge-blog” and select the “Resources” section to view the resources created by the CloudFormation template.

4. The template deploys the following AWS resources. Their names are based on the stack name.

a. Amazon S3 buckets:

i. Input bucket (eventbridge-blog-inputbucket-<randomid>): This bucket is used to upload the objects.

ii. Output bucket – eventbridge-blog-outputbucket-<randomid>: This bucket is used to store the processed images.

iii. Test input bucket (eventbridge-blog-testinputbucket-<randomid>): This bucket can be used for input unit testing.

b. An Amazon EventBridge rule (eventbridge-blog-EventBridgeRule-<randomid>): To invoke a Lambda function when new objects are created in the input bucket.

c. The Lambda function (notification_function): Contains the code for processing object events along with the necessary role to access the DynamoDB and S3 buckets.

d. A DynamoDB table (eventbridge-blog-LockTable-<randomid>): This is used to capture the state information of object processing operations. While the Lambda function is processing the objects, it performs a lock to track state changes.

5. The Lambda function uses three environment variables:

a. DDB_TABLE which is assigned with the deployed DynamoDB table name.

b. OUTPUT_BUCKET which is assigned with the deployed output bucket name.

c. COORDINATION is set to on. We change this value later when we test our use case.

To prevent the recursive patterns that cause run-away Lambda functions, don’t select the same bucket where your files are uploaded. If you do want to use the same bucket, then use filtering in the EventBridge rule to only process events in a certain prefix. For example, process events with the prefix “input/” only, and write the output with the prefix “output/.”

2. Application testing

In this section, to understand the impact of event ordering, you simulate the upload of a large number of files through a sample upload Python script upload.py in the code repo. The Lambda application that processes these files has an environment variable COORDINATION.

First, you test the application with the COORDINATION environment variable set to off.

Go to the AWS Lambda console and select the Lambda function. Go to the Configuration section and select Environment variables. Set the value of the COORDINATION variable to off, as shown in Figure 4. When this variable is set to off, the application bypasses the locking and sequencer ID check mechanism provided in the code. This simulates out of order processing of S3 Event Notifications.

Environment variables with COORDINATION set to off

Figure 4: Environment variables with COORDINATION set to off

2. Next, go to the directory where you copied the SAM application, and change directory to sample_client.

3. There are two sample cat and dog images included in the sample_client directory repository. Use the script upload.py to upload the sample images in random order by running the following command:

python3 upload.py 100 <Name_of_the _input_bucket>

For the bucket name, use the input bucket deployed by the CloudFormation stack.

The Python script uploads 100 copies of two images in batches of 10, where the second image (dog.jpg) overwrites the first (cat.jpg). After running the script, you see 100 final images in the input bucket. Each image has a unique key img<number>.jpg. For example, first it uploads cat.png as img1.jpg, and then replaces it later with dog.png using the same object key of img1.jpg. If you check the input bucket, then you see 100 dog images with object key names of img0.jpg to img99.jpg as shown in Figure 5.

Images in input bucket

Figure 5: Images in input bucket

4. For each upload, an S3 Event Notification is delivered to the target Lambda function. The Lambda function generates a new version of the image with inverted colors, and uploads it to the output bucket eventbridge-blog-outputbucket-<randomid>. The output image format is out-img<number>.jpg.

5. In a production use case, the time to process the images can vary for several reasons. For example, a network delay is introduced when reading the original image or uploading the transformed image, the first input is larger than the second, or the function unexpectedly runs slower. To reflect this, we artificially introduced a 10 second pause in the Lambda function using the time.sleep function in Python, which happens on some invocations at random. The randomness is introduced through a variable in the code called _SLOW_PROBABLITY. This happens after the original image is read from Amazon S3, but before the transformed version is uploaded. This simulates the function occasionally running slower.

6. When the processing is completed, you can download the images from the output bucket, and then browse them to see the results as shown Figure 6.

Processed images in output bucket

Figure 6: Processed images in output bucket

7. You would expect to see all the images of the dog since the upload overwrites the cat image with the dog image. However, the output includes both dog and cat images. This is because processing the initial cat image is sometimes delayed and completes after the dog image has already been transformed. When this happens, the cat image overwrites the result of the dog image.

3. Using Amazon DynamoDB to track state

Next, you use DynamoDB to track the state of the object key processing using sequencer and lock status fields. By doing this, you can identify when a more recent event for the same object (with the sequencer key) is already being processed.

To simulate this, you set the COORDINATION environment variable value to on.

Go to the AWS Lambda console and select the notification_lambda function. Go to the Configuration section and select Environment variables. Set the value of the COORDINATION variable to on, as shown in Figure 7.

Environment variables with COORDINATION set to on

Figure 7: Environment variables with COORDINATION set to on

2. Now, you repeat running the Python script upload.py to upload the sample images.

python3 upload.py 100 <Name_of_your_bucket>

3. If you check the input bucket, then images appear, as shown in Figure 8.

Images in input bucket

Figure 8: Images in input bucket

4. This time, since the COORDINATION environment variable is set to on, the AWS Lambda Python application checks the DynamoDB table to see if processing of the same object key is in progress or not. If it is in progress, then a lock field is used. The Lambda function waits for this lock to be released, and delays the processing of the newer event if it is already processing an older event.

5. As mentioned earlier, we artificially introduced a delay in processing through the 10 second pause in the function, which happens on some invocations at random.

6. When the processing is completed, you can download the images from the output bucket, and then browse them to see the results, as shown in Figure 9.

Figure 9 - Processed images in output bucket

Figure 9: Processed images in output bucket

7. This time, notice the output only contains dog images. You were able to accomplish this due to the locking and sequencer check that occurs before processing is initiated.

Cleaning up

Delete the resources created to make sure that you don’t continue to incur charges. First, empty both input and output buckets, and then run the following command to delete your resources. Select “y” for the prompts to delete the stack and folder of the deployed stack name.

sam delete --stack-name eventbridge-blog

Conclusion

In this post, we shared that on rare occasions, you might receive duplicate or out-of-order S3 Event Notifications for the same event. To handle this use case, you should consider designing your architecture to keep track of your events before your application processes them. By doing this, you can avoid situations like triggering the same workflow multiple times or triggering a workflow to process an out-of-order Event Notification.

In our example, we simulated an image processing application that transforms images by inverting the colors of the original. To make it as realistic as possible, we introduced a random delay to simulate scenarios like a network delay when reading the original image or the processing function unexpectedly running slower. In our use case, we wanted to make sure that we did not overwrite a newer version of an image (dog) with an older version (cat) by making processing of events for an object mutually exclusive and avoiding processing duplicate events. We showed how you can accomplish this by using the sequencer field from the S3 Event Notification JSON data structure to determine the order of events for a given object key, and DynamoDB to maintain the state of our application’s transactions.

Thank you for reading this post. If you have any comments or questions, then please leave them in the comments section. As a next step, we recommend that you review our sample code, and use it as inspiration to meet your own business needs.