AWS AI Blog

AWS DeepLens Extensions: Build Your Own Project

AWS DeepLens provides a great opportunity to learn new technologies, such as deep learning and Internet of Things (IoT), as well as to build innovative systems that can solve real-world problems. The device and service comes with a set of predefined projects that make it easy to hit the ground running. It is designed as an open platform to allow newbie and experienced developers alike to build (and share) new and exciting projects.

In this blog post, you will go through the process of building your own project, including the following steps:

  • Train a deep learning model (using Amazon SageMaker)
  • Optimize the trained model to run on the AWS DeepLens edge device
  • Develop an AWS Lambda function to load the model and use to run inference on the video stream
  • Deploy the AWS Lambda function to the AWS DeepLens device using AWS Greengrass
  • Wire the edge AWS Lambda function to the cloud to send commands and receive inference output
  • Profit

Train a deep learning model (using Amazon SageMaker)

Amazon SageMaker is another new service for data science heavy-lifting. It captures the many years of experience of Amazon data scientists in many aspects of Amazon.com business, from recommendation engines, to Alexa, Amazon Go, Amazon Robotics and endless other machine-learning-based systems.

The full process of designing and building a good machine learning model is beyond the scope of this blog post, although it is extremely interesting. In fact, once you get productive with the flow of deploying deep learning models to the DeepLens device and wiring it back to profit from its output, you will find yourself spending more and more time building new models to solve new problems in the real world.

A good starting point for newcomers to machine learning as well as expert data scientists is a set of notebooks that are in the Jupyter notebook that is available when you launch a Notebook Instance in Amazon SageMaker. For example, here is a notebook that shows both the process of transfer learning as well as the flow of using the Amazon SageMaker SDK to build, train, and deploy the fine-tuned deep learning model for inference endpoint hosting.

In this blog post you will focus on saving a model artifacts in Amazon S3 to enable the next steps in the process.

Here is an example of a model that was trained using a training job in SageMaker, including the output directory and hyper-parameters that were used in its training:

The hyper parameters help you know how to use the model. For example, based on the image_shape (3,224,224), we know that you can handle color images (3=RGB channels), and you need to resize it to be 224*224 pixels. It can also help in future training jobs to get more accurate models, for example running more times over the training data (epochs>2), changing the learning_rate, and adding more num_training_samples.

Optimize the trained model to run on the AWS DeepLens edge device

The step of hardware optimization of a model to the DeepLens device is an important example of this step in any real life deep learning system. In the previous steps you had the almost unlimited resources of the cloud for the training of the model, but when you want to use the model for inference, you are facing a new set of business constraints, mainly around scale. In the cloud you can get easlily a large cluster of GPU-based instances, and you can run multiple expirements over many hours of training. But when you want to deploy the model to an edge device, you need to think about the spec of that device in terms of chipset (CPU/GPU), memory, network bandwidth, and stability. Your DeepLens device might be strong enough to run multiple models on a high resoution video stream in a high frame rate, but many times you will want to run on a lower spec device. Therefore, Intel developed a library to take the artifacts of an MXNet model and optimize them to the Intel chip board that you have in your DeepLens device. This library is executed automatically for you when you deploy the model through the DeepLens console after you import the models from the previous step.

After you finish the training the model, you can point to the training job ID when you import model to DeepLens. (Note that that both Sagemaker and DeepLens need to be in the same AWS Region to import it.)

Develop an AWS Lambda function to load the model and use to run inference on the video stream

The AWS Greengrass Core that is running on the IoT device (AWS DeepLens Camera in our case) is able to run AWS Lambda Python functions that are deployed to it. In a future post we will explore the steps to create a Greengrass Group, Core, prepare and register a device, and you can check the Greengrass documentation for a getting started guide. AWS DeepLens is automatic for most of these steps. We can focus here on developing a Lambda function to deploy to a registered and configured device.

For each Lambda function, follow these steps:

  • Load the model
  • Capture a frame
  • Run model inference on the frame
  • Parse the results of the inference
  • Publish the results to a MQTT topic

Let’s look at some examples of each of these steps.

Load the model

The AWS DeepLens device comes preinstalled with a helper library awscam. This library makes it easy to focus on the Lambda function logic. It wraps the most common steps, such as loading a model. As you can see in the following example, you need to provide the path to the model XML file, and context of the model (GPU/CPU), and the library will do the loading and binding of the model automatically:

import awscam
modelPath = "/opt/awscam/artifacts/mxnet_deploy_CaltechTransfer_224_FP16_FUSED.xml"
# Load model to GPU (use {"GPU": 0} for CPU)
mcfg = {"GPU": 1}
model = awscam.Model(modelPath, mcfg)

Note that this step should be defined outside the inner inference function (for example, greengrass_infer_image_run below). This step should run only once because it can take a few seconds for the model to load from the disk to the memory and to the GPU.

Capture a frame

OpenCV (cv2) is used to manipulate the images before (resizing) and after (drawing boxes and labels) the deep learning models. Since each model is trained on a different input specification, the first step is to resize the captured frame to the right dimensions:

import cv2
input_width = 224
input_height = 224
ret, frame = awscam.getLastFrame()
# Resize frame to fit model input requirement
frameResize = cv2.resize(frame, (input_width, input_height))

Run model inference on the frame

The helper library awscam wraps the predict command in a simple doInference function, and makes this part of the Lambda function concise:

# Run model inference on the resized frame
inferOutput = model.doInference(frameResize)

Parse the results of the inference

The helper library supports a few classical computer vision problems: “classification” for object classification, that is providing labels; “ssd” (“single-shot-multibox-detector”) for object detection and localization, that  provides labels and bounding boxes for the objects; and “segmentation” that segments the image to region and provides pixel level output (for example, for style transfer). The following example is for the common “ssd” type models:

modelType = "ssd"
parsed_results = model.parseResult(modelType, inferOutput)['ssd']

Publish the results to a MQTT topic

The last part is to send out the output of the model. This logic depends on the type of the model and the type of problem you are trying to solve. A simple output is the labels in the image, such as Dog/Cat or Hotdog/Not Hotdog, as shown in the following example. Other types of output include cropping of faces in the image to be sent to a face recognition from a face detection model, or bounding boxes on object in a stream of video, or a neural style transfer of the images/video. The simplest way to send the output from the device is over MQTT using the Greengrass client, as shown in the following code. This channel can support textual messages, but also images (with bounding boxes and labels, for example) after textual encoding.

# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')
iotTopic = 'iot-data'
for obj in parsed_results:
    if obj['prob'] > max_threshold:
        label = '{{"label":"{}", "prob":{:.2f}%'.format(outMap[obj['label']], obj['prob']*100 )
        client.publish(topic=iotTopic, payload = label)

Deploy the AWS Lambda function to the DeepLens device using AWS Greengrass

Greengrass can run your Lambda function in two main modes, pinned and on-demand. The recommended way for AWS DeepLens models is to use the pinned option as the start up of the Lambda function. This is long, especially when the deep learning models are large (hundreds of MB and even a few GB). You can control how often and when the inference is triggered in a few modes:

  • Highest Frame Rate – Run the function in an infinite loop without any “sleep” between the frames. Depending on the speed of the model inference, preprocessing and post processing, you can get to a frame rate of 10-30 FPS.
  • Specific Frame Rate – Run the function in an infinite loop with a predefined “sleep” between the frames. Some tasks, such as face detection can run in a rate of 1-5 frames-per-second and provide the required functionality of detecting all faces in a region. You can use Timer to control the rate of your function inference:
    from threading import Timer
    def greengrass_infer_image_run():
        # Read an image
        # Preprocessing
        # Run model inference on the image
        # Parse results
        # Output results
        
        # Asynchronously schedule this function to be run again in 1/2 a second
        Timer(0.5, greengrass_infer_image_run).start()
  • On-Demand – Run the function whenever you want to trigger it, either manually or from a different event. You can do that even if you are running with the function pinned, using the event handler. The following example shows how to trigger the inference on every event, but you can also control the function further (switch model or model mode, for example) through the event handler, by parsing the parameters of the event.
    def greengrass_infer_image_run():
        # Read an image
        # Preprocessing
        # Run model inference on the image
        # Parse results
        # Output results
        
    def lambda_handler(event, context):
        client.publish(topic=iotTopic, payload="About to call image inference function")
        greengrass_infer_image_run()
    return

A simple way to get a Greengrass skeleton for your Lambda function is to use one of the Greengrass blueprint in the AWS Lambda console, preferably greengrassHelloWorld, as it includes already the Greengrass client library in its package. Create a Lambda function using this blueprint and then replace the Python code of the function with your code, and publish your newly created Lambda function. Now you can add it to your project, and deploy it to the device through the AWS DeepLens console.

Wire the local Lambda function to the cloud to send the command and receive inference output

As you saw earlier, the Lambda function is able to write out its output using IoT topics over the MQTT protocol. The default output topic that is used with the built-in projects is:

iotTopic = '$aws/things/{}/infer'.format(os.environ['AWS_IOT_THING_NAME'])

You can find this in the AWS DeepLens console or the AWS IoT console. You can choose to use the same format or use any other topic name in your Lambda functions, such as iotTopic = ‘iot-data’ in the previous examples.

You also saw earlier that you can use the lambda_handler to trigger the Lambda function in the On-Demand mode. To allow that you need to set up a subscription between IoT cloud and your Lambda function using the IoT console. For example, here are inbound and outbound subscriptions to a Lambda function (ImageInferenceTest version 19):

In this example, the edge Lambda function is listening to topic ‘trigger-image-inference’, and will trigger the inference every time an event is published to this topic. The second subscription allows you to see output messages from the edge Lambda, and react to them on the cloud side. For example, you can use the AWS IoT rule engine to filter specific messages (“face detected,” for example) and send them to the other cloud-side Lambda functions, Amazon Elasticsearch Service, Amazon Kinesis, or others. Don’t forget to deploy (under “Actions”) the subscription to enable the subscriptions on the DeepLens device as well.

We also recommend that you enable Amazon CloudWatch Logs for the Greengrass group of your device, to allow you to see the logs of the Lambda functions and the Greengrass core info (or debug) logs. You can find it under the “Settings” section of the Greengrass group console page.

Conclusion

In this blog post you saw how to start extending the open project environment that is available with AWS DeepLens. In future posts you will see more detailed and specific examples that are built on this flow, using the same structure and building blocks. Here is a diagram that shows the various steps that you can follow to extend existing projects (better models or different Lambda functions) or create a completely new project from scratch.

If you have any questions, please leave them in the comments.


Additional Reading

Learn how to extend AWS DeepLens to send SMS Notifications with AWS Lambda.


About the Author

Guy Ernest is a principal solutions architect in Amazon AI. He has the exciting opportunity to help shape and deliver on a strategy to build mind share and broad use of Amazon’s cloud computing platform for AI, machine learning and deep learning use cases. In his spare time, he enjoys spending time with his wife and family, gathering embarrassing stories, to share in talks about Amazon and the future of AI.