Applying Computer Vision to Images with Amazon Rekognition, AWS Lambda, and Box Skills

By Joe Norman, Partner Solutions Architect at AWS

Box Skills is a framework that allows developers to integrate third-party artificial intelligence (AI) and machine learning (ML) technologies to process and apply rich metadata to files in Box.

Amazon Rekognition is perfectly suited to be the intelligence layer in a Box Skill because of its ability to perform object and scene detection, celebrity recognition, text detection, and unsafe image detection without custom training required.

In this post, I will walk through creating a sample custom Box Skill. We’ll do this by using Amazon Rekognition Image and AWS Lambda to apply computer vision to image files in Box. This could help you to automatically detect any celebrities that appear in an image, for example. This new metadata allows you to quickly find images based on keyword searches, or find images that may be inappropriate and should be moderated.

Box is an AWS Partner Network (APN) Advanced Technology Partner that empowers enterprises to revolutionize how they work by securely connecting their people, information, and applications.

Architecture Overview

Rekognition-Box-1

When everything is configured, here’s what happens at each step:

A Box user uploads an image to a folder with Box Skills configured.
The Box Skills service sends an event to the configured Amazon API Gateway invocation URL. The event body contains read and write tokens, file identifiers, and everything that’s needed to interact with the file.
API Gateway invokes the Lambda function and passes the event to the function.
Lambda uses the read token from the event body to download the image file from the Box folder.
Lambda uploads the image file to a preconfigured, private Amazon Simple Storage Service (Amazon S3) bucket, so it’s accessible by Amazon Rekognition. The Amazon S3 bucket is configured to automatically delete the file later using lifecycle management. Alternatively, the Lambda function could delete the file from the Amazon S3 bucket after step 8.
Lambda makes calls to Amazon Rekognition DetectLabels, DetectText, RecognizeCelebrities, and DetectModerationLabels, pointing to the Amazon S3 object created in step 5.
Amazon Rekognition grabs the Amazon S3 object and performs its analysis.
Amazon Rekognition returns its analysis to the Lambda function.
The Lambda function creates a formatted JSON of metadata based on the results from step 8. It writes that metadata to the file in Box by using the write token delivered in the event body.
Lambda passes a “Success” message to API Gateway.
API Gateway passes the “Success” message through to the Box Skills service.

Setting up the Resources

Creating and configuring the required components requires an AWS account, a Box account, and a few steps. At a high level, here are the steps we’re going to go through:

Create an Amazon S3 bucket to facilitate transfer of files between your Box account and Amazon Rekognition.
Set up AWS Identity and Access Management (IAM) roles and policies to allow Lambda to access the Amazon S3 bucket and Amazon Rekognition.
Create your Lambda function and upload your zipped code.
Set up an API in API Gateway to proxy to your Lambda function.
Activate Box Skills on the desired folder within your Box account.

Step 1: Create and Configure an Amazon S3 Bucket

To perform its analysis, Amazon Rekognition Image accepts either raw data or a reference to an object in Amazon S3. Because we’re doing four separate API calls to Amazon Rekognition, it makes sense to put the files into Amazon S3 and reference that. Then, the Lambda function only has to upload the file externally once after downloading it from Box. You can do each of these steps in the AWS Command Line Interface (CLI), but I’m walking through them in the console for the sake of illustration.

Bucket Creation Workflow

In your AWS account, navigate to Amazon S3 and create a new bucket. Create a unique name and note it for later. The bucket name is the only environment variable you need for the Lambda function later. Choose any Region where Amazon Rekognition Image is available. This information is available in the AWS Region table. Nothing in the code or examples is affected by the Region, as long as Amazon Rekognition Image is available there.

Keep versioning off, as we’re setting all objects in the bucket to expire after a day. Set tags, logging, and encryption as your organization requires. I’m using AES-256 default encryption with S3-managed keys. The permissions should default to giving you all rights and denying public read, which is how you should keep it.

Lifecycle Management

Next, we need to set up a lifecycle rule on the bucket we just created. Every time you drop a new image file into your target folder in Box, Lambda downloads it and then uploads it into this Amazon S3 bucket. Setting up a lifecycle rule on the bucket ensures all files uploaded to the bucket are later deleted—even in the case of an unforeseen error in the Lambda function.

From the Amazon S3 console, choose the bucket you just created. Along the top, choose the Management tab, and then choose the Lifecycle section. Choose the Add Lifecycle Rule or Get Started button, and you’re brought into the lifecycle rule workflow.

Rekognition-Box-2

On the first page, give the rule any name you want, such as “Expire objects after 1 day.” Skip the next page, Transitions, because we won’t be doing anything with the different Amazon S3 storage classes.

Rekognition-Box-3

On the Expiration page, we’re setting all versions to expire after one day and cleaning up incomplete multi-part uploads to avoid unnecessary storage costs. Select every check box on the page, and then set all text box values to 1. Then choose Next, and finally Save. If you’re using the AWS CLI, you can pass in the following JSON to match this rule:

{  
    "Rules": [{  
        "Status": "Enabled",  
        "NoncurrentVersionExpiration": {  
            "NoncurrentDays": 1  
        },  
        "Expiration": {  
            "Days": 1  
        },  
        "AbortIncompleteMultipartUpload": {  
            "DaysAfterInitiation": 1  
        },  
        "ID": "Expire in One Day"  
    }]

Step 2: Create IAM Roles and Policies

The next step is to set up a role for your Lambda function. In addition to the basic execution permissions, the Lambda function needs to be able to upload files to the Amazon S3 bucket that we just created, and then make the four calls to Amazon Rekognition.

Policy Creation

Navigate to IAM in the AWS Management Console, and choose Policies along the left side. Create a new policy with whatever name you want and keep note of the name you choose. Then paste in the following JSON, and replace [YOUR-S3-BUCKET-NAME-HERE] with the name of the bucket you made earlier:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3Permissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::[YOUR-S3-BUCKET-NAME-HERE]/*"
        },
        {
            "Sid": "AmazonRekognitionPermissions",
            "Effect": "Allow",
            "Action": [
                "rekognition:DetectLabels",
                "rekognition:DetectModerationLabels",
                "rekognition:RecognizeCelebrities",
                "rekognition:DetectText"
            ],
            "Resource": "*"
        },
        {
            "Sid": "LambdaBasicExecution",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

Role Creation

Now, we need to add the policy we just created to a new role. On the left side of the console, navigate over to Roles, and choose Create Role. On the next screen, with AWS Service selected by default, choose Lambda from the list, because our Lambda function is assuming this role. Then, on the next page, search for the policy you just created in the previous section, select it, and go to the next page. Type any name and description you like, but note the name for later. Choose Create Role. Now, on to the main event.

Step 3: Create the Lambda Function

The Lambda function in this walkthrough is doing the orchestration and data transformation between all the different pieces of this app. Navigate to Lambda in the AWS Management Console. Make sure your AWS Region is set to the same one you used for the Amazon S3 bucket, and choose Create Function.

In the Author From Scratch section, type any name you like, and set the Runtime to Node.js 6.10. For Role, choose Choose Existing Role, and choose the role you just finished making.

Lambda Configuration

For now, we’re holding off on the coding part and moving into the other function settings in the Lambda console. At the top of the Function Code section, change “Handler” to “app.handler” before scrolling down to Environment Variables. Create an environment variable for the bucket that you made in Step 1:

Key: S3_BUCKET
Value: [YOUR-S3-BUCKET-NAME]

Next, under Basic Settings, change the Memory and Timeout values to handle the load you’re putting on the function. The Lambda function holds a copy of each image that’s dropped into your Box folder in memory, so that might affect the value you choose.

Likewise, you might want to increase the Timeout value—to give the function enough time to download the file, upload it to Amazon S3, and make each of the Amazon Rekognition calls. I chose 512 MB for memory and 1 min for timeout, but I’ve found the function usually completes in a few seconds. You can alter these values later if you find they’re too high or too low. Note the values you select here affect your AWS costs.

Function Code

All of the code is available in the amazon-rekognition-image-for-box-skills GitHub repository. You need to clone or download the code to your development environment, install the dependencies, and create a ZIP archive to load into Lambda.

The first step is to set up a development environment for Node.js. AWS Cloud9 makes that pretty simple, but you can use the environment you’re comfortable with. The first thing to do in the environment is to install Node.js with your package manager. In AWS Cloud9, it’s already installed, so you’re good to go.

Next, create a folder for your project and navigate to it. Then, run the git clone command in the console to copy the amazon-rekognition-image-for-box-skills repo. After you’re done with that, update the submodule and install the dependencies by running npm install.

If you look at my package.json or the top of the app.js, you can see that we use the AWS Software Development Kit (SDK) for JavaScript, the Axios SDK, and an npm link to the Box Skills Kit. Axios provides us with a convenient interface for grabbing the image files from Box and sending them to Amazon S3. The Box Skills Kit provides some helper functions for interacting with Box Skills metadata and includes the larger Box SDK as a sub-dependency. At the time of writing, Box Skills Kit is not available in NPM, so we’re pointing to its repo as a submodule.

$ mkdir amazon-rekognition-box-skill
$ cd amazon-rekognition-box-skill/
$ git clone https://github.com/aws-samples/amazon-rekognition-image-for-box-skills.git
$ git submodule update –init --recursive
$ npm install

We’re not going over all of the code, but let’s highlight some of the main pieces. The first thing to do is extract the relevant information from the event body that comes from the Box Skills service:

    var filesReader = new FilesReader(event.body);
    var skillsWriter = new SkillsWriter(filesReader.getFileContext());

FilesReader and SkillsWriter are helper functions from the Box Skills Kit that simplify working with Box Skills. FilesReader helps read the event data and manipulate the file saved in Box. We’ll use SkillsWriter to post our Amazon Rekognition data back to the file in Box, without having to structure the JSONs ourselves or manage clients and tokens.

For a full breakdown of the event body structure that comes from the Box Skills service, you can reference the Box Skills documentation.

Now that we have the information we need from the event body, we can download the file from Box and upload it to our Amazon S3 bucket:

filesReader.getFileContext().fileName;
    var boxGetPath = filesReader.getFileContext().fileDownloadURL;
    Axios.get(boxGetPath, { responseType: 'arraybuffer' })
        .then(response => {
            var buffer = new Buffer(response.data, 'binary');

Topics Cards back to the original file in Box to be written onto it as metadata. Box Skills includes built-in metadata templates for this purpose.

Because we have some dependencies, the last step is to create an archive from the development environment, and upload the archive to Lambda. Next, zip the application folder, download the .zip file, and upload it into the Lambda console to complete the Lambda stage:

$ zip -r lambda-archive.zip ./*

Step 4: Set up Amazon API Gateway

The last step in your AWS account is to create an API in API Gateway. This API proxies the event from the Box Skills service to Lambda, and also passes back any success or error status codes.

Navigate to the API Gateway console. Make sure that you’re in the same AWS Region as your Amazon S3 bucket and Lambda function, and choose Create API. Choose the New API radio button. Then enter any API name and description. For endpoint type, choose Regional, and then choose Create API.

Because this is a very simple API, we’re just using the root resource. On the next screen, in the Actions dropdown, choose Create Method, and then choose POST in the dropdown that appears. After you select the check mark, a setup page appears. Set the options as follows:

Integration type: Lambda function
Use Lambda Proxy integration: checked
Lambda Region: [use the same Region as your Lambda function]
Lambda function: [type in your Lambda function name]
Use Default Timeout: checked

Next, to enable Cross Origin Resource Sharing, choose the Actions dropdown, and then choose Enable CORS. Click through the defaults.

Now, it’s time to deploy your API. In the Actions dropdown, choose Deploy API. Under Deployment Stage, choose [new stage] and type in a stage name here. Keep in mind this forms part of the invocation URL that you send to Box later, when you request Box Skills activation. Some examples include, “stage,” “prod,” “beta,” and “v1.” Then choose Deploy.

The next page you see will be Stages. You need to get your Invoke URL, but you need to be careful to grab the right one. Choose the carrot next to the stage name in the middle of the screen. Because we used the root resource, choose POST under /. Copy the Invoke URL that appears for the POST method.

Rekognition-Box-4

Step 5: Request Box Skills Activation

Now that your app is set up, you just need to send a request to Box to activate Box Skills on your target folder. First, there are two pieces of information that you need to gather.

Enterprise ID

The enterprise ID is a number that represents your company uniquely within Box. To get this number, an administrator in your Box account should go into the Enterprise Settings page and grab the Enterprise ID.

Folder ID

You have two options when you register for Box Skills. You can enable the Box Skill for the entire enterprise, or only for a specific folder or set of folders. In this case, it makes sense to choose the latter, so we need a Folder ID. This is a number that uniquely identifies the target folder or folders within Box. Getting a Folder ID doesn’t require an administrator. Simply go into your Box account in a browser, open your target folder, and look up at the address bar. The end of the URL should read “/folder/[some number].” That number is the folder ID to make a note of.

Fill Out the Request Form

At the time of this writing, activating Box Skills requires filling out a form. Box reviews your request and activates your Box Skill. You can currently find the form under “Register your skill application with Box” in the Box Skills documentation. As long as you have your API Gateway Invoke URL, your Box Enterprise ID, and Box Folder ID, you have everything you need to complete the form. After Box activates your Box Skill, it’s live.

Test Your Box Skill

After your Box Skill is activated, testing it is as simple as dropping an image file into the folder you set up in your Box account. Drop in an image, wait a few seconds, and choose the file name to bring up the image details. On the right side, you should see a magic wand icon. Choose it, and you can see the results of your Amazon Rekognition analyses.

Rekognition-Box-5

If things aren’t behaving like you expect, you can jump into Amazon CloudWatch and look through the logs from your Lambda function.

Conclusion

The solution we outlined in this post is a simple place to start using AWS services together with Box Skills. With services like Amazon Transcribe, Amazon Translate, Amazon Comprehend, Amazon Rekognition, and Amazon SageMaker, there’s no limit to the ways you can apply AI/ML to your media files!