AWS Storage Blog

Reliable event processing with Amazon S3 event notifications

As AWS Solutions Architects, we help customers understand and plan AWS architectures that meet their business goals while remaining scalable, cost effective, secure, and reliable.

One common pattern that comes up frequently is the desire to move from manual or polling-based strategies to reliable events processing, also known as event-driven architecture (EDA). This approach dovetails nicely with modern, distributed architectures because it makes it easier to decouple and separately scale individual system components. It also simplifies code and removes the overhead of polling-based processing, and may even allow for a reduction in server footprint or adoption of serverless solutions. This can often lead to cost reduction.

In one such case, we were recently working with a customer who provides a mobile app that allows users to upload and share images with others. The customer had traditionally relied on community feedback and sample-based manual review to flag and remove images that violate their terms of service (for example, those containing violence or nudity). However, they were now approaching a scale where they wanted a faster, more automated approach.

This particular customer was writing their users’ images to Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance. We determined that they could use S3 event notifications to reliably and automatically process each image with Amazon Rekognition, a deep learning-based image and video analysis service. This would enable them to identify inappropriate content and quickly prevent it from being seen by their users.

In this blog post, we provide an overview of Amazon S3 events notifications and share a brief overview of the customer solution we described in this introduction. We also share ideas for how you can use Amazon S3 events to automate a variety of other processing activities for objects in Amazon S3. By doing so, you can learn how to easily process S3 events at scale without the need to manage servers.

Amazon S3 event notifications

First, let’s talk a bit more about Amazon S3 events and how they work.

High-level overview

Amazon S3 event notifications enable you to receive notifications when certain object events happen in your bucket. Event-driven models like this mean that you no longer have to build or maintain server-based polling infrastructure to check for object changes, nor do you have to pay for idle time of that infrastructure when there are no changes to process.

Setting up S3 event notifications

Start by creating an event notification configuration at the S3 bucket-level that determines which events trigger a notification. Allowed event types include but are not limited to:

  • New object creation
  • Object removal
  • Object restored from the Amazon S3 Glacier or S3 Glacier Deep Archive storage class

You may optionally specify object prefix or suffix filters to limit the applicable objects, such as a prefix of images/ or a suffix of .jpg. Note that if the image-processing workflow re-writes an image, you should filter out the processed image to avoid an infinite processing loop.

Your next step is to choose a destination for your notifications, which may be one of the following:

Finally, the event is delivered as a simple JSON message. The following is an abbreviated example:

   "Records":[
      {
         "awsRegion":"us-west-2",
         "eventTime":"1970-01-01T00:00:00.000Z",
         "eventName":"ObjectCreated:Put",
         "userIdentity":{
            "principalId":"AIDAJDPLRKLG7UEXAMPLE"
         },
         "s3":{
            "s3SchemaVersion":"1.0",
            "configurationId":"testEventRule",
            "bucket":{
               "name":"mybucket",
            },
            "object":{
               "key":"HappyFace.jpg",
               "size":1024,
               "eTag":"d41d8cd98f00b204e9800998ecf8427e",
               "versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
            }
         }
      }
   ]

Solution architecture

Now let’s return to our image-processing workflow.

Since user images were stored in an application bucket that contained a variety of content, Amazon S3 events were configured to only include objects with a prefix of content/images and a suffix of .jpg and delivered to an AWS Lambda function written in Python.

The Lambda function takes the Amazon S3 object path and passes it as a parameter to a synchronous invocation of the Amazon Rekognition DetectModerationLabels API to determine whether the image contains inappropriate content. If Amazon Rekognition determines that the content is inappropriate, the Lambda function makes an S3 PutObjectTagging API call to tag the object in S3 with a tag of BlockedContent = true”. The users’ application tier inspects each object’s tag and only shows images if this tag is not present.

The following is an illustrative example of the Lambda function code:

const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const rekognition = new AWS.Rekognition();

// Note – your Lambda’s execution role requires the following IAM permissions:
//   s3:PutObjectTagging (scoped to appropriate bucket / object path)
//   s3:GetObject        (scoped to appropriate bucket)
//   s3:HeadObject       (scoped to appropriate bucket)
//   recognition:DetectModerationLabels

exports.handler = async (event, context) => {

  for (const record of event.Records) {

    var bucket = record.s3.bucket.name;
    var key = record.s3.object.key;

    var params = {
      MinConfidence: 80, // Only return labels if confidence score >= 80%
      Image: {
        S3Object: {
          Bucket: bucket,
          Name: key,
        }
      }
    };
    
    console.info(`Checking ${bucket}/${key} for inappropriate content...`);
    var labelResponse = await rekognition.detectModerationLabels(params).promise();
    
    // You could adapt logic to only flag certain content, such as weapons or nudity:
    if (labelResponse.ModerationLabels.length > 0) {
      console.info(`Inappropriate content identified, flagging object...`);
      var params = {
        Bucket: bucket,
        Key: key,
        Tagging: {
          TagSet: [{ Key: "BlockedContent", Value: "true" }]
        }
      };
      await s3.putObjectTagging(params).promise();
      console.info("Tagged object with BlockedContent=true.");
    }
    else {
      console.info(`No moderated images detected.`);
    }
  }
  return 'Done!';
};

To test, we uploaded an image containing a picture of a weapon on Sep 14, 2020 5:23:53 PM GMT-0700:

# Upload test image with AWS CLI
aws s3 cp weapons.jpg s3://YOUR_BUCKET/content/images/weapons.jpg

The CloudWatch Logs for the Lambda function show that the S3 Event Notification triggered the function less than one second after creating the S3 test object. The image detection and tagging workflow took 1.3 seconds to complete:

The CloudWatch Logs for the Lambda function show that the S3 Event Notification triggered the function less than one second after creating the S3 test object.

We then verified the object was tagged with the AWS CLI:

aws s3api get-object-tagging --bucket YOUR_BUCKET --key content/images/weapons.jpg
{
    "TagSet": [
        {
            "Key": "BlockedContent",
            "Value": "true"
        }
    ]
}

An overview is shown in the following diagram:

Using Amazon S3 event notifications, AWS Lambda, and Amazon Rekognition to analyze certain data

This diagram is an abbreviated view of the complete solution.

While not shown, the complete solution also included additional steps, such as routing flagged content to an SQS queue for review by a moderator and inserting tracking records into a reporting database.

That being said, hopefully this gives you an idea of how useful Amazon S3 events can be.

Cost

In addition to reducing operational overhead, serverless event processing solutions like this can be cost effective. At the time of this writing, key components of the on-demand pricing include:

This comes out to a cost of $0.001004 per image processed (note: this does not include ancillary charges, such as the per GB-month of storage cost of Amazon S3).

Summary

In this blog post, we provided an overview of Amazon S3 events notifications and walked through an example of how it may be used to automatically review and flag potentially inappropriate images uploaded to Amazon S3.

Amazon S3 events provide a reliable way to automate event-driven workflows based on object creation or other changes in Amazon S3, depending on your setup. You use Amazon S3 events to send these event notifications to destinations like Amazon SQS or AWS Lambda for further processing.

Serverless, event-driven approaches allow you to build faster, reduce cost by not paying for idle infrastructure, and free up your developers’ time to focus on engineering that differentiates your business.

Thanks for learning about event processing with Amazon S3 events. If you have any questions or comments, don’t hesitate to leave them in the comments section!

Mat Werber

Mat Werber

Mat Werber is an AWS solutions architect responsible for providing architectural guidance across the full AWS stack with a focus on Serverless, Analytics, Redshift, DynamoDB, and RDS. He also has an audit background in IT governance, risk, and controls.

Dylan Qu

Dylan Qu

Dylan Qu is an AWS solutions architect responsible for providing architectural guidance across the full AWS stack with a focus on Data Analytics, AI/ML and DevOps.