Use Amazon Rekognition to Build an End-to-End Serverless Photo Recognition System
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
Imagine you work for a marketing agency that has tens of thousands of stock images. You find that many images don’t have descriptive file names and others are completely mislabeled. You don’t want to spend hours and hours relabeling them and moving them around to different folders. But what if you could find the images you need without relying on metadata? In this blog post, we will review an end-to-end solution to show you how to do this using Amazon Rekognition.
Amazon Rekognition is a service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You can also search and compare faces. Rekognition’s API lets you quickly add sophisticated deep learning-based visual search and image classification to your applications.
In this post, we’ll focus on searching for objects and scenes in images. A future post will focus on searching for faces.
The solution requires three general steps:
- Adding images
- Searching for images
- Removing images
Below is a general outline of each step so that you can get a mental picture of how the process works. These sections are followed by a script that automates these steps for you so that you can quickly try out the use case.
Adding an image
First, you authenticate with Amazon Cognito. Then you upload the image to a bucket on Amazon S3 using the AWS CLI or a custom app. The S3 Bucket has an ObjectCreated AWS Lambda event that passes the object key and bucket name to the AWS Lambda function. The function calls Rekognition’s Object and Scene detection API. After Rekognition returns the labels describing the picture, the Lambda function saves the image’s S3 location along with the retrieved meta-data to an Amazon OpenSearch domain.
Searching for images
Once you’ve authenticated and uploaded the image to S3, you search for it. For testing purposes, you will trigger a search command using the setup-generated curl command (you can use any other application, like Postman, that facilitates making POST requests).
The curl request sends a POST HTTP request to an Amazon API Gateway endpoint, which proxies the request to an AWS Lambda function. The function parses the request and retrieves the search-key request header, which contains the search key that’s used to search for images. Using the value in the search-key, the AWS Lambda function calls out to the Amazon OpenSearch domain. The result is parsed and, if any values are found, the Lambda function calls Amazon S3 to generate a signed URL for the S3 objects. The result is formatted and returned to the calling user in the form of a JSON document.
Deleting an image
Deleting the image is the easiest of the three steps. When an image is deleted from an Amazon S3 bucket, the ObjectRemoved event is triggered. An AWS Lambda function kicks off with the already-deleted object information (object key and bucket name), and removes all references of this object from the Amazon OpenSearch domain.
Running the demonstration script
For this demo, you’ll analyze images using Rekognition’s Object and Scene detection API.
- Before you start, make sure that your working environment is set up to run the script. You need an AWS account with a default VPC, Java 8, and the latest AWS CLI (tested with aws-cli/1.11.29 Python/2.7.12). For help installing the CLI, see the installation and upgrade instructions. The script won’t run on Windows.
- Run the script.
This configuration script streamlines the environment by provisioning the required resources, including the Lambda functions, S3 bucket, and API Gateway APIs.
If the script fails for any reason, make sure that the requirements in step 1 are met. You will need to revert the project’s “Properties.kt” file to its original values and re-run the script. Also, you should run the generated cleanup script.
Once you run the script, you will see output similar to the output in the section below.
Test your new app
In addition to provisioning the AWS resources, the script also generated three sample commands that you can run to test the setup:
The values will be specific to your environment.
To test your app:
Copy and paste the aws s3 cp command to upload an image to your S3 bucket and run it. Once the command finishes, run the curl command. It takes a couple of seconds to start new Lambda functions, so you might want to re-run the curl command once or twice. You will see the following result:
You can copy and paste the signedUrl value into a browser. You’ll see the copied image. Since the AWS CLI unicodes the url, run it through something like native2ascii to decode it before pasting it in the browser window.
The third command removes the image from the S3 bucket. Run it and then try searching for the image again. Initially, you’ll see the image being returned. This is because the process of triggering the deletion Lambda function is not synchronous with the deletion of the actual S3 bucket object (similar to the creation process). Eventually, you’ll get an empty array in the response.
Setting up the AWS Services
In the first section of this blog post, you looked at an overview of the three steps of the process (Add, Search, and Delete). In the second section, you ran a script that analyzed images using Rekognition’s Object and Scene detection API.
At this point, you might be ready to move beyond the automated script and learn more about how each component of the solution works. Let’s dive a little deeper into some configurations aspects of the six primary services involved in the setup of this solution:
To make secure uploads and downloads, we’re using Amazon Cognito to authenticate and authorize the users. You might have noticed a not-so-easy-to-remember value in the URL of the S3 object:
That’s an identity ID of an authenticated user. Only the user with that unique identity ID can upload pictures under that prefix, and only that user can retrieve or delete those images. For demo purposes, you used the CLI command to upload and delete images, which completely bypasses the Cognito authentication. However, once you decide to provide the same functionality to your users from a browser app you don’t have to make additional policy changes. The IAM policy that allows only authenticated users to upload, remove, and download the images is declared in the setup/cognito-quickstart/authrole.json file.
Amazon OpenSearch is a powerful search and analytics engine. In our application, it is being used as a meta-data store. The first policy contained in the setup/elasticsearch_service_policy.json locks down access to the OpenSearch domain by allowing only the Lambda function’s role access to the domain. The second policy allows unabated access from the IP address of where the setup script was executed. This is done for testing purposes, in case you wanted to use Kibana to look at the indexed data, so it can be removed without any impact to the functionality of the application.
As the architectures have shown, there are three Lambda functions that service the requests:
This function is triggered whenever you add an object to an S3 bucket. It retrieves the object and bucket names, and extrapolates the Cognito identity ID from the object prefix. Next, it invokes the Rekognition service to analyze the image and receives the image labels that it then stores in the OpenSearch domain along with the other image metadata.
Once an S3 bucket object is deleted, S3 kicks off this Lambda function to handle the cleanup. The function gets the name of the S3 object that is deleted and removes it from the OpenSearch domain.
To search images, the user sends a POST request to an API Gateway endpoint which proxies the request to the Lambda function, passing the payload and headers. The API Gateway checks that the user calling the endpoint is authenticated by checking the Authorization header value, which is a Cognito-generated JWT ID Token, by running the request through a Cognito Authorizer. Cognito-managed data points are also passed to the Lambda function by automatically extrapolating them from the JWT ID Token.
All of this setup is created by importing the Swagger definition file, which is done by the setup script. Below is the request payload that the Lambda function can use for further processing:
The payload is marshalled into the ApigatewayRequest.Input POJO automatically.
Since the Lambda function is invoked only after API Gateway verified that the user is authenticated and the session is still valid, the function retrieves an identity id of the calling user from the Cognito Service. That’s the key that’s used to search an OpenSearch index for all images that match the search-key header.
There are no additional steps required to configure Amazon Rekognition.
Amazon API Gateway
Amazon API Gateway acts as the entry into the search functionality. To get started, a Swagger JSON document is imported to create the required APIs. For more information on the API Gateway/Lambda setup, see the API Gateway Proxy Setup documentation.
Except for creating the bucket and then setting the CORS policy, no other steps are required.
You’re just getting started with Rekognition! Consider looking at the setup script to get a better understanding of the steps. That will give you an idea of how to interact with AWS services in a more streamlined, programmatic way. Then look at the source code of the application. It’s written in Kotlin, a JVM language (created by the same team that developed IntelliJ IDEA). Finally, start playing around with this quickstart project to create a UI that interacts with the photo recognition system. You can follow these steps to get started.
If you have questions or suggestions, please leave your feedback in the comments.
About the Author
Vladimir Budilov is a Sr. Technical Account Manager, specializing in architecting resilient and flexible solutions in Mobile, Serverless, and noSQL. In his spare time he finds innovative ways to convince his wife to go camping (without any luck yet).