Applying Computer Vision to Images with Amazon Rekognition, AWS Lambda, and Box Skills
By Joe Norman, Partner Solutions Architect at AWS
Box Skills is a framework that allows developers to integrate third-party artificial intelligence (AI) and machine learning (ML) technologies to process and apply rich metadata to files in Box.
Amazon Rekognition is perfectly suited to be the intelligence layer in a Box Skill because of its ability to perform object and scene detection, celebrity recognition, text detection, and unsafe image detection without custom training required.
In this post, I will walk through creating a sample custom Box Skill. We’ll do this by using Amazon Rekognition Image and AWS Lambda to apply computer vision to image files in Box. This could help you to automatically detect any celebrities that appear in an image, for example. This new metadata allows you to quickly find images based on keyword searches, or find images that may be inappropriate and should be moderated.
When everything is configured, here’s what happens at each step:
- A Box user uploads an image to a folder with Box Skills configured.
- The Box Skills service sends an event to the configured Amazon API Gateway invocation URL. The event body contains read and write tokens, file identifiers, and everything that’s needed to interact with the file.
- API Gateway invokes the Lambda function and passes the event to the function.
- Lambda uses the read token from the event body to download the image file from the Box folder.
- Lambda uploads the image file to a preconfigured, private Amazon Simple Storage Service (Amazon S3) bucket, so it’s accessible by Amazon Rekognition. The Amazon S3 bucket is configured to automatically delete the file later using lifecycle management. Alternatively, the Lambda function could delete the file from the Amazon S3 bucket after step 8.
- Lambda makes calls to Amazon Rekognition DetectLabels, DetectText, RecognizeCelebrities, and DetectModerationLabels, pointing to the Amazon S3 object created in step 5.
- Amazon Rekognition grabs the Amazon S3 object and performs its analysis.
- Amazon Rekognition returns its analysis to the Lambda function.
- The Lambda function creates a formatted JSON of metadata based on the results from step 8. It writes that metadata to the file in Box by using the write token delivered in the event body.
- Lambda passes a “Success” message to API Gateway.
- API Gateway passes the “Success” message through to the Box Skills service.
Setting up the Resources
Creating and configuring the required components requires an AWS account, a Box account, and a few steps. At a high level, here are the steps we’re going to go through:
- Create an Amazon S3 bucket to facilitate transfer of files between your Box account and Amazon Rekognition.
- Set up AWS Identity and Access Management (IAM) roles and policies to allow Lambda to access the Amazon S3 bucket and Amazon Rekognition.
- Create your Lambda function and upload your zipped code.
- Set up an API in API Gateway to proxy to your Lambda function.
- Activate Box Skills on the desired folder within your Box account.
Step 1: Create and Configure an Amazon S3 Bucket
To perform its analysis, Amazon Rekognition Image accepts either raw data or a reference to an object in Amazon S3. Because we’re doing four separate API calls to Amazon Rekognition, it makes sense to put the files into Amazon S3 and reference that. Then, the Lambda function only has to upload the file externally once after downloading it from Box. You can do each of these steps in the AWS Command Line Interface (CLI), but I’m walking through them in the console for the sake of illustration.
Bucket Creation Workflow
In your AWS account, navigate to Amazon S3 and create a new bucket. Create a unique name and note it for later. The bucket name is the only environment variable you need for the Lambda function later. Choose any Region where Amazon Rekognition Image is available. This information is available in the AWS Region table. Nothing in the code or examples is affected by the Region, as long as Amazon Rekognition Image is available there.
Keep versioning off, as we’re setting all objects in the bucket to expire after a day. Set tags, logging, and encryption as your organization requires. I’m using AES-256 default encryption with S3-managed keys. The permissions should default to giving you all rights and denying public read, which is how you should keep it.
Next, we need to set up a lifecycle rule on the bucket we just created. Every time you drop a new image file into your target folder in Box, Lambda downloads it and then uploads it into this Amazon S3 bucket. Setting up a lifecycle rule on the bucket ensures all files uploaded to the bucket are later deleted—even in the case of an unforeseen error in the Lambda function.
From the Amazon S3 console, choose the bucket you just created. Along the top, choose the Management tab, and then choose the Lifecycle section. Choose the Add Lifecycle Rule or Get Started button, and you’re brought into the lifecycle rule workflow.
On the first page, give the rule any name you want, such as “Expire objects after 1 day.” Skip the next page, Transitions, because we won’t be doing anything with the different Amazon S3 storage classes.
On the Expiration page, we’re setting all versions to expire after one day and cleaning up incomplete multi-part uploads to avoid unnecessary storage costs. Select every check box on the page, and then set all text box values to 1. Then choose Next, and finally Save. If you’re using the AWS CLI, you can pass in the following JSON to match this rule:
Step 2: Create IAM Roles and Policies
The next step is to set up a role for your Lambda function. In addition to the basic execution permissions, the Lambda function needs to be able to upload files to the Amazon S3 bucket that we just created, and then make the four calls to Amazon Rekognition.
Navigate to IAM in the AWS Management Console, and choose Policies along the left side. Create a new policy with whatever name you want and keep note of the name you choose. Then paste in the following JSON, and replace [YOUR-S3-BUCKET-NAME-HERE] with the name of the bucket you made earlier:
Now, we need to add the policy we just created to a new role. On the left side of the console, navigate over to Roles, and choose Create Role. On the next screen, with AWS Service selected by default, choose Lambda from the list, because our Lambda function is assuming this role. Then, on the next page, search for the policy you just created in the previous section, select it, and go to the next page. Type any name and description you like, but note the name for later. Choose Create Role. Now, on to the main event.
Step 3: Create the Lambda Function
The Lambda function in this walkthrough is doing the orchestration and data transformation between all the different pieces of this app. Navigate to Lambda in the AWS Management Console. Make sure your AWS Region is set to the same one you used for the Amazon S3 bucket, and choose Create Function.
In the Author From Scratch section, type any name you like, and set the Runtime to Node.js 6.10. For Role, choose Choose Existing Role, and choose the role you just finished making.
For now, we’re holding off on the coding part and moving into the other function settings in the Lambda console. At the top of the Function Code section, change “Handler” to “app.handler” before scrolling down to Environment Variables. Create an environment variable for the bucket that you made in Step 1:
Next, under Basic Settings, change the Memory and Timeout values to handle the load you’re putting on the function. The Lambda function holds a copy of each image that’s dropped into your Box folder in memory, so that might affect the value you choose.
Likewise, you might want to increase the Timeout value—to give the function enough time to download the file, upload it to Amazon S3, and make each of the Amazon Rekognition calls. I chose 512 MB for memory and 1 min for timeout, but I’ve found the function usually completes in a few seconds. You can alter these values later if you find they’re too high or too low. Note the values you select here affect your AWS costs.
All of the code is available in the amazon-rekognition-image-for-box-skills GitHub repository. You need to clone or download the code to your development environment, install the dependencies, and create a ZIP archive to load into Lambda.
The first step is to set up a development environment for Node.js. AWS Cloud9 makes that pretty simple, but you can use the environment you’re comfortable with. The first thing to do in the environment is to install Node.js with your package manager. In AWS Cloud9, it’s already installed, so you’re good to go.
Next, create a folder for your project and navigate to it. Then, run the git clone command in the console to copy the amazon-rekognition-image-for-box-skills repo. After you’re done with that, update the submodule and install the dependencies by running npm install.
We’re not going over all of the code, but let’s highlight some of the main pieces. The first thing to do is extract the relevant information from the event body that comes from the Box Skills service:
FilesReader and SkillsWriter are helper functions from the Box Skills Kit that simplify working with Box Skills. FilesReader helps read the event data and manipulate the file saved in Box. We’ll use SkillsWriter to post our Amazon Rekognition data back to the file in Box, without having to structure the JSONs ourselves or manage clients and tokens.
For a full breakdown of the event body structure that comes from the Box Skills service, you can reference the Box Skills documentation.
Now that we have the information we need from the event body, we can download the file from Box and upload it to our Amazon S3 bucket:
Topics Cards back to the original file in Box to be written onto it as metadata. Box Skills includes built-in metadata templates for this purpose.
Because we have some dependencies, the last step is to create an archive from the development environment, and upload the archive to Lambda. Next, zip the application folder, download the .zip file, and upload it into the Lambda console to complete the Lambda stage:
Step 4: Set up Amazon API Gateway
The last step in your AWS account is to create an API in API Gateway. This API proxies the event from the Box Skills service to Lambda, and also passes back any success or error status codes.
Navigate to the API Gateway console. Make sure that you’re in the same AWS Region as your Amazon S3 bucket and Lambda function, and choose Create API. Choose the New API radio button. Then enter any API name and description. For endpoint type, choose Regional, and then choose Create API.
Because this is a very simple API, we’re just using the root resource. On the next screen, in the Actions dropdown, choose Create Method, and then choose POST in the dropdown that appears. After you select the check mark, a setup page appears. Set the options as follows: