AWS Machine Learning Blog
Visual search on AWS—Part 2: Deployment with AWS DeepLens
April 2023 Update: Starting January 31, 2024, you will no longer be able to access AWS DeepLens through the AWS management console, manage DeepLens devices, or access any projects you have created. To learn more, refer to these frequently asked questions about AWS DeepLens end of life. |
In Part 1 of this blog post series, we examined use cases for visual search and how visual search works. Now we’ll extend the results of Part 1 from the digital world to the physical world using AWS DeepLens, a deep-learning-enabled video camera. Most current applications of visual search don’t involve direct interaction with the physical world in real time. Instead, visual search typically happens within the digital world of websites and apps that submit a static image for a visual search query.
AWS DeepLens allows us to directly interact with physical objects in real time. In this part of the blog post series, we’ll take the model we created in Part 1 and deploy it to an AWS DeepLens device. The device will interface with a backend API and web app to display a set of visual search matches for a real-world, physical item viewed by the AWS DeepLens device.
System architecture
The architecture is based around the goals of making the system:
- Inexpensive
- Easily and rapidly deployed
- Accessible by developers and others who are not deep learning experts
- Time-saving for model development, with no need to spend substantial amounts of time and money gathering labeled training data and training new models.
As discussed in Part 1, Amazon SageMaker is used to create a model for generating feature vectors. This model “featurizes” (converts to feature vectors) a set of reference comparison images that will be compared against a query image. The model is created by modifying an existing Convolutional Neural Net (CNN) model pretrained on the well-known ImageNet dataset, thereby avoiding the time and cost of training a new model from scratch. The same CNN model used for featurizing reference items also is deployed to an AWS DeepLens device.
Let’s examine how the system works. Consider the following architecture diagram:
When shown a real-world, physical item, an AWS DeepLens device generates a feature vector representing that item. The feature vector generated by the device is sent to the AWS Cloud using the AWS IoT Core service. An AWS IoT rule is used to direct the incoming feature vector from AWS DeepLens to a cloud-based AWS Lambda function. The feature vector is used by this search Lambda function to look up visually similar items by making a request to an Amazon SageMaker endpoint hosting an index of reference item feature vectors. This index was built using the Amazon SageMaker k-Nearest Neighbors (k-NN) algorithm, as discussed in Part 1. The search Lambda function returns the top visually similar reference item matches and related metadata (product titles, image URLs, etc.), which are then consumed by a web app via a separate API Lambda function fronted by Amazon API Gateway.
Various techniques are employed to make the search more performant. For example, retrieval of previously found matches is expedited by having the search Lambda function store the matches in-memory in a LIFO queue based in Amazon ElastiCache Redis. The web app polls the API every few seconds for the latest matches, invoking the API Lambda function to read the latest matches stored in the LIFO queue. Using ElastiCache Redis also decouples the web app from the rest of the architecture, so the different components can be separately modified, evolved, and scaled. For example, the AWS DeepLens device can produce feature vectors at a different rate than the rate at which the web app consumes the resulting visually similar matches, and the web app will always be assured of fetching the most recent (rather than stale) matches.
AWS DeepLens project
You’ll need to create an AWS DeepLens project with (1) the featurizer model prepared in Part 1, and (2) a Lambda function to run on the DeepLens device. For complete step-by-step instructions, please refer to the README of this project’s GitHub repository at github.com/awslabs/visual-search. Here is a high level overview of what you need to do to combine the model and Lambda function into a DeepLens project:
- Put the featurizer model in an Amazon S3 bucket. This is necessary so the AWS DeepLens service can deploy the model to your device as part of a DeepLens project.
- Create and publish a Lambda function that will generate feature vectors using the model. Publishing the function is necessary to enable it to be deployed to the AWS DeepLens device as part of your DeepLens project.
- Finally, you’ll create an AWS DeepLens project that wraps your model and Lambda function, and then deploy the project to your DeepLens device.
The Lambda function deployed to the device loads the CNN model, which is optimized to run on the AWS DeepLens GPU by the DeepLens Model Optimizer API. An event loop in the Lambda function loads video frames generated by the AWS DeepLens camera. To obtain a feature vector for an item shown in a video frame, it is necessary to invoke the doInference
method of the DeepLens awscam.Model
API action.
Although Lambda functions for AWS DeepLens projects often are coded to generate inferences such as the class (car, boat, etc.) of an item, with a straightforward code modification they can generate and return feature vectors as required for visual search. In the code snippet that follows, take a look at the code within the while loop. After the AWS DeepLens device loads a video frame and resizes it to the shape required by the CNN model, the feature vector is retrieved simply as the value of the first key value pair returned by the doInference
API call.
After the AWS DeepLens project is deployed to your DeepLens device, you should be able to confirm that the device is outputting feature vectors. It is a good idea to do this before moving on to deploy the other components of the project.
To confirm proper device output, go to the AWS DeepLens console, then in the Devices tab find the listing for your DeepLens device. Choose your device name, and then go the middle section of the next page to find the Device Details section. Find the MQTT topic for your device, copy the topic string, and choose the blue info link next to it. On the Inference output action page, choose the AWS IoT console blue link at the bottom. This takes you to the AWS Greengrass console, where you can subscribe to that topic by pasting the topic string into the Subscription topic text box, as shown in the following screenshot. Assuming the visual search AWS DeepLens project has been deployed to your device, and the device is powered on and has network connectivity, you should see the feature vectors appear a couple of minutes after project deployment is complete:
Deploying other components
To deploy visual search in a real-world scenario outside a Jupyter notebook, at a minimum we need: (1) an index for looking up visually similar matches, and (2) a data store of reference item metadata. In Part 1 of this blog post series, we built the index using Amazon SageMaker. We need the second component because the k-NN index only returns the IDs of the matches. The metadata data store allows us to map image IDs returned by index lookups to image URLs, product titles, and other metadata for displaying visually similar match results. In addition to the components we discussed, we also need some supporting infrastructure.
Complete, detailed step-by-step instructions are posted on GitHub at github.com/awslabs/visual-search. Here is a high level overview of the infrastructure:
- Data Stores (two): We’ll use an Amazon DynamoDB table to store the reference item metadata we just discussed. The second data store is ElastiCache Redis; it is used to store the latest matches.
- Lambda functions (two): We’ve discussed the search Lambda function, which primarily invokes an Amazon SageMaker endpoint that hosts the k-NN index. The related code is at github.com/awslabs/visual-search/tree/master/search. A second Lambda function implements the business logic of a simple RESTful API that provides access to matches. Its code is at github.com/awslabs/visual-search/tree/master/API.
- RESTful API and web app: Besides the aforementioned business logic Lambda function, the RESTful API also requires an API created with Amazon API Gateway. The RESTful API is accessed by a web app, which displays visually similar matches for a query item. Code for the web app is available at github.com/awslabs/visual-search/tree/master/web-app.
After you’ve set up this infrastructure, you’re ready to try out visual search with DeepLens!
Conclusion and extensions
Visual search technology has the potential to be transformative in regard to how people interact with computing devices. As a result, visual search is an active area of research. The visual search solution described in this blog post series, while complete in and of itself, can be extended and modified in various ways.
For example, you could try out different CNN architectures with different numbers of layers, a different type of index for storing feature vectors, or even fine tuning the CNN for the specific use case. As in other areas of machine learning, there are tradeoffs between accuracy and computation time. You can experiment to find the tradeoff that makes the most sense for your project. Another possible extension is to use camera hardware other than an AWS DeepLens device. This is possible with AWS Greengrass ML Inference, which enables you deploy and serve models locally on various kinds of connected devices.
About the Author
Brent Rabowsky focuses on data science at AWS, and leverages his expertise to help AWS customers with their own data science projects.