AWS Storage Blog
Reliable event processing with Amazon S3 Event Notifications
As AWS Solutions Architects, we help customers understand and plan AWS architectures that meet their business goals while remaining scalable, cost effective, secure, and reliable.
One common pattern that comes up frequently is the desire to move from manual or polling-based strategies to reliable events processing, also known as event-driven architecture (EDA). This approach dovetails nicely with modern, distributed architectures because it makes it easier to decouple and separately scale individual system components. It also simplifies code and removes the overhead of polling-based processing, and may even allow for a reduction in server footprint or adoption of serverless solutions. This can often lead to cost reduction.
In one such case, we were recently working with a customer who provides a mobile app that allows users to upload and share images with others. The customer had traditionally relied on community feedback and sample-based manual review to flag and remove images that violate their terms of service (for example, those containing violence or nudity). However, they were now approaching a scale where they wanted a faster, more automated approach.
This particular customer was writing their users’ images to Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance. We determined that they could use S3 Event Notifications to reliably and automatically process each image with Amazon Rekognition, a deep learning-based image and video analysis service. This would enable them to identify inappropriate content and quickly prevent it from being seen by their users.
In this blog post, we provide an overview of Amazon S3 events notifications and share a brief overview of the customer solution we described in this introduction. We also share ideas for how you can use Amazon S3 events to automate a variety of other processing activities for objects in Amazon S3. By doing so, you can learn how to easily process S3 events at scale without the need to manage servers.
Amazon S3 Event Notifications
First, let’s talk a bit more about Amazon S3 events and how they work.
High-level overview
Amazon S3 event notifications enable you to receive notifications when certain object events happen in your bucket. Event-driven models like this mean that you no longer have to build or maintain server-based polling infrastructure to check for object changes, nor do you have to pay for idle time of that infrastructure when there are no changes to process.
Setting up S3 Event Notifications
Start by creating an event notification configuration at the S3 bucket-level that determines which events trigger a notification. Allowed event types include but are not limited to:
- New object creation
- Object removal
- Object restored from the Amazon S3 Glacier or S3 Glacier Deep Archive storage class
You may optionally specify object prefix or suffix filters to limit the applicable objects, such as a prefix of images/ or a suffix of .jpg. Note that if the image-processing workflow re-writes an image, you should filter out the processed image to avoid an infinite processing loop.
Your next step is to choose a destination for your notifications, which may be one of the following:
- Amazon Simple Notification Service (Amazon SNS)
- Amazon Simple Queue Service (Amazon SQS)
- AWS Lambda (serverless code execution)
Finally, the event is delivered as a simple JSON message. The following is an abbreviated example:
"Records":[
{
"awsRegion":"us-west-2",
"eventTime":"1970-01-01T00:00:00.000Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{
"principalId":"AIDAJDPLRKLG7UEXAMPLE"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"testEventRule",
"bucket":{
"name":"mybucket",
},
"object":{
"key":"HappyFace.jpg",
"size":1024,
"eTag":"d41d8cd98f00b204e9800998ecf8427e",
"versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
}
}
}
]
Solution architecture
Now let’s return to our image-processing workflow.
Since user images were stored in an application bucket that contained a variety of content, Amazon S3 events were configured to only include objects with a prefix of content/images and a suffix of .jpg and delivered to an AWS Lambda function written in Python.
The Lambda function takes the Amazon S3 object path and passes it as a parameter to a synchronous invocation of the Amazon Rekognition DetectModerationLabels API to determine whether the image contains inappropriate content. If Amazon Rekognition determines that the content is inappropriate, the Lambda function makes an S3 PutObjectTagging API call to tag the object in S3 with a tag of “BlockedContent = true”. The users’ application tier inspects each object’s tag and only shows images if this tag is not present.
The following is an illustrative example of the Lambda function code:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const rekognition = new AWS.Rekognition();
// Note – your Lambda’s execution role requires the following IAM permissions:
// s3:PutObjectTagging (scoped to appropriate bucket / object path)
// s3:GetObject (scoped to appropriate bucket)
// s3:HeadObject (scoped to appropriate bucket)
// recognition:DetectModerationLabels
exports.handler = async (event, context) => {
for (const record of event.Records) {
var bucket = record.s3.bucket.name;
var key = record.s3.object.key;
var params = {
MinConfidence: 80, // Only return labels if confidence score >= 80%
Image: {
S3Object: {
Bucket: bucket,
Name: key,
}
}
};
console.info(`Checking ${bucket}/${key} for inappropriate content...`);
var labelResponse = await rekognition.detectModerationLabels(params).promise();
// You could adapt logic to only flag certain content, such as weapons or nudity:
if (labelResponse.ModerationLabels.length > 0) {
console.info(`Inappropriate content identified, flagging object...`);
var params = {
Bucket: bucket,
Key: key,
Tagging: {
TagSet: [{ Key: "BlockedContent", Value: "true" }]
}
};
await s3.putObjectTagging(params).promise();
console.info("Tagged object with BlockedContent=true.");
}
else {
console.info(`No moderated images detected.`);
}
}
return 'Done!';
};
To test, we uploaded an image containing a picture of a weapon on Sep 14, 2020 5:23:53 PM GMT-0700:
# Upload test image with AWS CLI
aws s3 cp weapons.jpg s3://YOUR_BUCKET/content/images/weapons.jpg
The CloudWatch Logs for the Lambda function show that the S3 Event Notification triggered the function less than one second after creating the S3 test object. The image detection and tagging workflow took 1.3 seconds to complete:
We then verified the object was tagged with the AWS CLI:
aws s3api get-object-tagging --bucket YOUR_BUCKET --key content/images/weapons.jpg
{
"TagSet": [
{
"Key": "BlockedContent",
"Value": "true"
}
]
}
An overview is shown in the following diagram:
This diagram is an abbreviated view of the complete solution.
While not shown, the complete solution also included additional steps, such as routing flagged content to an SQS queue for review by a moderator and inserting tracking records into a reporting database.
That being said, hopefully this gives you an idea of how useful Amazon S3 events can be.
Cost
In addition to reducing operational overhead, serverless event processing solutions like this can be cost effective. At the time of this writing, key components of the on-demand pricing include:
- $0.001 per image for the first one million images processed by Amazon Rekognition
- Approximately $0.0000027079 for the 1300ms of GB-second runtime of the Lambda function
- $0.0000002 per Lambda request
- $0.000001 per S3 object tag request
- $0 for the S3 Event Notification itself
This comes out to a cost of $0.001004 per image processed (note: this does not include ancillary charges, such as the per GB-month of storage cost of Amazon S3).
Summary
In this blog post, we provided an overview of Amazon S3 events notifications and walked through an example of how it may be used to automatically review and flag potentially inappropriate images uploaded to Amazon S3.
Amazon S3 events provide a reliable way to automate event-driven workflows based on object creation or other changes in Amazon S3, depending on your setup. You use Amazon S3 events to send these event notifications to destinations like Amazon SQS or AWS Lambda for further processing.
Serverless, event-driven approaches allow you to build faster, reduce cost by not paying for idle infrastructure, and free up your developers’ time to focus on engineering that differentiates your business.
Thanks for learning about event processing with Amazon S3 events. If you have any questions or comments, don’t hesitate to leave them in the comments section!