AWS Compute Blog

Fanout S3 Event Notifications to Multiple Endpoints

by Vyom Nagrani | on | in AWS Lambda | | Comments

John Stamper John Stamper, AWS Solution Architect

S3 fan out use-case diagram

 

Use Cases

The above architecture is an event-driven general-purpose parallel data processing system – data enters S3, notification of new data is sent to SNS, which packages the S3 event notification as a message and delivers it to subscribers. This architecture is ideal for workloads that need more than one data derivative of an object. The purpose of the subscribers is to create a layer of processing which accommodates a wide variety of data sizes and subsequently send the results of processing to some storage layer. The architecture is not prescriptive with regard to the post-processing storage layer and is out of scope for this article. In the illustration above, black arrows depict data and blue arrows depict event notifications.

Example use cases are described below.

Image Processing
Master image must be processed to produce multiple image derivatives, e.g. resized, OpenCV result.

Application Log Processing
Application log data must be processed to produce multiple log derivatives, e.g. formatted for operations, security, marketing.

Content Transformation
Documents of one format, e.g. Microsoft Word, must be converted to multiple other formats such as PDF, RTF, MHTML, and ODT.

SNS supports message delivery to several types of subscribers, notably Lambda functions, SQS queues, and HTTP/HTTPS endpoints. Lambda functions make it easy to respond to data without the need for servers. To process data with a long-running or existing application, you can also use SNS to easily send the message to an SQS queue or HTTP/HTTPS endpoint. At this stage messages can be processed by EC2, which offers a wide range of compute/memory/storage options. Overall, the architecture can provide an event-driven parallel data processing system that can leverage the entire AWS compute offering.

This article will focus on the steps to configure a S3 bucket to send an event notification to a SNS topic and subscribe two Lambda functions to the topic. The resulting architecture is a simple implementation of S3 event notification fanout to Lambda functions for processing, which is applicable for workloads that require multiple data derivatives of the same object. In the center of the architecture is the ‘event manifold’, similar to a mechanical manifold, which intakes an event notification at one end (S3), transforms it to a message, and distributes it to all subscribers (data processing elements). This architecture allows customers to build an event-driven parallel data processing architecture that is fast, flexible, and easy to maintain over time. Below is an illustration of the architecture to be assembled.

S3 fan out simple use-case diagram

 

Step 1 – Create the Bucket

To create a Bucket, follow the documentation here. For this article, set the bucket name to event-manifold-bucket.

 

Step 2 – Create the Topic

To create a Topic, follow the documentation here. For this article, set the topic name to event-manifold-topic.

 

Step 3 – Update the Topic Policy to allow Event Notifications from an S3 Bucket

The Topic’s Policy must permit an S3 Bucket to publish event notifications to it. To do so, do the following:

  1. Select the event-manifold-topic
  2. Select Edit Topic Policy from the Actions button
  3. Edit Topic Policy

  4. Select the Advanced Tab
  5. Clear the existing Policy and replace it with the following policy statement:
    Topic Policy JSON

    • Replace ‘region’ with the region in which the Topic is located, e.g. us-west-1.
    • Replace ‘account id’ with the account id of the Topic, e.g. 123456789012.
    • Replace ‘topic name’ with event-manifold-topic.
    • Replace ‘bucket name’ with event-manifold-bucket.
  6. Select the Update Policy button

At this point we have the S3 bucket, a SNS Topic, and the Topic is configured to permit our specific bucket to call the Publish API on it. Next we will configure the S3 bucket to send notifications to the Topic. We will choose the event type ObjectCreated (All) for our general purpose data processing architecture.

 

Step 4 – Configure the S3 Bucket to send Event Notifications to the SNS Topic

  1. Select the ‘Events’ portion of the S3 bucket created in Step 1.
  2. Enter a name for the notification, e.g. s3fanout
  3. Enter an event type for the notification, e.g. ObjectCreated (All)
  4. Select SNS Topic radio button of the Send To radio button group
  5. Select Add SNS topic ARN from the SNS Topic drop down list
  6. Enter the SNS Topic ARN created in Step 2
  7. Click the Save button

The picture below is an example.

S3 Event

 

Step 5 – Create the IAM Role for the Lambda functions

In this step you will create an IAM role which grants permissions for the Lambda functions to write to CloudWatch Logs and read objects from the originating S3 bucket.

  1. In the IAM portion of the AWS console, click the Policies link on the left.
  2. Select the Create Policy button at the top.
  3. In the Create Policy window, select the button which corresponds to the Policy Generator
  4. In the Permissions window, add two statements
    1. Statement One

      1. Effect = Allow
      2. AWS Service = Amazon CloudWatch Logs
      3. Actions = CreateLogGroup, CreateLogStream, PullLogEvents
      4. ARN = arn:aws:logs:*:*:*
      5. Select the Add Statement button
    2. Statement Two
      1. Effect = Allow
      2. AWS Service = Amazon S3
      3. Actions = GetObject
      4. ARN = arn:aws:s3:::event-manifold-bucket
      5. Select the Add Statement button
  5. Select the Next Step button.
  6. In the Review Policy window, enter a name for the Policy, e.g. ‘Fanout-Lambda-Policy’.
  7. Select the Create Policy button.
  8. In the IAM portion of the AWS console, click the Roles link on the left.
  9. Click the Create New Role button at the top.
  10. In the Select Role Name window, enter a name for the Role, e.g. CloudWatchLogs-Write-S3Bucket-Read. Click the Next Step button.
  11. In the Select Role Type window, select the button which corresponds to AWS Lambda.
  12. In the Attach Policy window, select the policy you previously created, ‘Fanout-Lambda-Policy’.
  13. In the Review Policy window, select the Create Role button.

 

Step 6 – Create the Lambda functions

In this step you will create two Lambda functions in the same region as the SNS Topic, data-processor-1 and data-processor-2. Each function will be edited inline and their execution role will be the role CloudWatchLogs-Write-S3Bucket-Read created in Step 5. This role provides visibility into what the functions are doing through simple logging statements and allows the functions to read the objects from S3.

  1. In the Lambda portion of the AWS console, click the Create a Lambda Function button.
  2. On the Select Blueprint page, select the sns-message blueprint option.
  3. On the Configure event sources page, select event-manifold-topic from the SNS topic dropdown list. Click Next.
  4. On the Configure Function page, enter a name for the function, e.g. ‘data-processor-1.
  5. As an option, enter a description of the function in the Description field.
  6. Use the default Runtime, Node.js.
  7. Use the default Code entry type, Edit code inline.
  8. Use the default Handler, index.handler.
  9. Select the CloudWatchLogs-Write-S3Bucket-Read role from the Role drop down list.
  10. Click the Next button.
  11. On the Review page, select the Enable now radio button.
  12. Click the Create Lambda Function button at the bottom of the page.

Repeat steps 1-12 to create a second Lambda function, setting the name to data-processor-2 in step 4.

At this point you have two Lambda functions, each of which programmed to receive an event notification record from a SNS Topic and log to CloudWatch Logs the incoming event data and the SNS message portion of the record. The picture below shows the code.

Simple Lambda Function

 

Step 7 – Modify the Lambda functions to process SNS messages of S3 event notifications

The code to process a SNS message delivery is shown above. A SNS message delivery is a JSON object containing an array named ‘Records’ with one element within the array – the SNS Message Delivery Object. The element contains several items of data about the event. An example of the SNS message delivery to a Lambda function is below.

Sample SNS Message Object

For the S3 event notification fanout architecture (S3 publish event notification -> SNS Topic -> SNS message delivery of S3 publish event notification -> Lambda function), the JSON object received by the Lambda function is different from a JSON object from a SNS message delivery. The S3 event notification is contained within the Sns.Message attribute of the SNS Message Delivery Object. An example of a SNS message delivery of a S3 event notification is shown below.

SNS Message Delivery Object

Some extra code is needed for the Lambda function to process the object created in S3. First the code must capture the Sns.Message object from the incoming record. Next, that object must be processed to unbundle the JavaScript object of the S3 event notification from the Sns.Message attribute. Example Lambda code to do this is shown below.

exports.handler = function(event,context) {
   var snsMsgString = JSON.stringify(event.Records[0].Sns.Message);
   var snsMsgObject = getSNSMessageObject(snsMsgString);
   var srcBucket = snsMsgObject.Records[0].s3.bucket.name;
   var srcKey = snsMsgObject.Records[0].s3.object.key;
   console.log(‘SRC Bucket: ’ + srcBucket);
   console.log(‘SRC Key: ’ + srcKey);
   …

The function getSNSMessageObject(string) must be included in your Lambda function and is shown below.

function getSNSMessageObject(msgString) {
   var x = msgString.replace(/\\/g,’’);
   var y = x.substring(1,x.length-1);
   var z = JSON.parse(y);
   
   return z;
}

Below is the Lambda function for data-processor-1.

Advanced Lambda Function

 

Test the architecture

The architecture illustrated in the beginning has been created and assembled. When new objects are created in the event-manifold-bucket, S3 will send the event notification to the SNS Topic event-manifold-topic, which will subsequently deliver a message to both Lambda functions of the new object creation. You can see this by inspecting the logs in Amazon CloudWatch.

CloudWatch Log Groups

By selecting the Log Group for each Lambda function, you can see that both functions received the notification of the S3 create object event and they each have the data they need to pull the record from S3 and process it.

Data Processor 1 Log Stream
Data Processor 2 Log Stream

As an alternative to viewing the log output of the Lambda functions in CloudWatch, you can also view metrics in CloudWatch provided by SNS Topics, including NumberOfMessagesPublished, PublishSize, NumberOfNotificationsDelivered, and NumberOfNotificationsFailed.

 

Alternative Architecture

The architecture described in this article is one option available to customers who require an event-driven general parallel data-processing system. Another option to achieve S3 Fanout of Event Notifications is to configure a S3 bucket to send the event notification directly to a ‘master’ Lambda function. In this approach, the ‘master’ Lambda function replaces the SNS topic (the event manifold) and must be programmed to send data to the various elements of the processing layer.

Leveraging a Lambda function to serve as the event manifold provides the architect a high degree of choice with regard to the processing elements due to the flexibility offered by the current runtime environments of Lambda, Node.js and Java8. In addition, processing elements do not need to ‘unbundle’ the S3 event notification from the SNS.Message attribute of the Message Delivery Object. In exchange for high choice and reduced software maintenance, the architect receives additional maintenance of the ‘master’ Lambda function – one unit for every downstream data processing element. Below is an illustration of the alternative architecture.

S3 fan out alternative architecture

 

Fast, Flexible, Easy to Maintain

The speed of the system is optimal since notifications of S3 events are event-driven and are delivered in parallel to subscribers, which results in parallel processing of data.

The flexibility of the system is high – to date SNS supports two endpoints that are capable of runtime processing: Lambda and HTTP/HTTPS endpoints, and SQS facilitates processing by queue consumers. Additional processing subscribers can be easily added or removed per business requirements.

The subscriber processing layer elements and the post-processing storage layer drives the maintenance of the system. The ingest storage layer (S3) is a highly scalable, reliable, and low-latency data storage infrastructure and the event manifold component (SNS) is a highly scalable, flexible, cost-effective notification service which require no ongoing maintenance.

 

Conclusion

The above architectures describe and illustrate event-driven general purpose parallel data processing systems. The first architecture utilizes at least three AWS services: Amazon S3, Amazon SNS, and AWS Lambda, with SNS serving as the ‘event manifold’. The alternative architecture utilizes at least two AWS services: Amazon S3 and AWS Lambda, with a Lambda function serving as the ‘event manifold’. These architectures are designed to support Internet scale data processing workloads, require low operational maintenance, provide the architect the option of leveraging the entire AWS Compute family for processing, and are flexible to dynamic business requirements for derivatives of data objects.

TAGS: ,