Building and deploying an object detection computer vision application at the edge with AWS Panorama

Computer vision (CV) is sought after technology among companies looking to take advantage of machine learning (ML) to improve their business processes. Enterprises have access to large amounts of video assets from their existing cameras, but the data remains largely untapped without the right tools to gain insights from it. CV provides the tools to unlock opportunities with this data, so you can automate processes that typically require visual inspection, such as evaluating manufacturing quality or identifying bottlenecks in industrial processes. You can take advantage of CV models running in the cloud to automate these inspection tasks, but there are circumstances when relying exclusively on the cloud isn’t optimal due to latency requirements or intermittent connectivity that make a round trip to the cloud infeasible.

AWS Panorama enables you to bring CV to on-premises cameras and make predictions locally with high accuracy and low latency. On the AWS Panorama console, you can easily bring custom trained models to the edge and build applications that integrate with custom business logic. You can then deploy these applications on the AWS Panorama Appliance, which auto-discovers existing IP cameras and runs the applications on video streams to make real-time predictions. You can easily integrate the inference results with other AWS services such as Amazon QuickSight to derive ML-powered business intelligence (BI) or route the results to your on-premises systems to trigger an immediate action.

In this post, we look at how you can use AWS Panorama to build and deploy a parking lot car counter application.

Parking lot car counter application

Parking facilities, like the one in the image below, need to know how many cars are parked in a given facility at any point of time, to assess vacancy and intake more customers. You also want to keep track of the number of cars that enter and exit your facility during any given time. You can use this information to improve operations, such as adding more parking payment centers, optimizing price, directing cars to different floors, and more. Parking center owners typically operate more than one facility and are looking for real-time aggregate details of vacancy in order to direct traffic to less-populated facilities and offer real-time discounts.

To achieve these goals, parking centers sometimes manually count the cars to provide a tally. This inspection can be error prone and isn’t optimal for capturing real-time data. Some parking facilities install sensors that give the number of cars in a particular lot, but these sensors are typically not integrated with analytics systems to derive actionable insights.

With the AWS Panorama Appliance, you can get a real-time count of number of cars, collect metrics across sites, and correlate them to improve your operations. Let’s see how we can solve this once manual (and expensive) problem using CV at the edge. We go through the details of the trained model, the business logic code, and walk through the steps to create and deploy an application on your AWS Panorama Appliance Developer Kit so you can view the inferences on a connected HDMI screen.

Computer vision model

A CV model helps us extract useful information from images and video frames. We can detect and localize objects in a scene, and identity and classify images and action recognition. You can choose from a variety of frameworks such as TensorFlow, MXNet, and PyTorch to build your CV models, or you can choose from a variety of pre-trained models available from AWS or from third parties such as ISVs.

For this example, we use a pre-trained GluonCV model downloaded from the GluonCV model zoo.

The model we use is the ssd_512_resnet50_v1_voc model. It’s trained on the very popular PASCAL VOC dataset. It has 20 classes of objects annotated and labeled for a model to be trained on. The following code shows the classes and their indexes.

voc_classes = {
	'aeroplane'		: 0,
	'bicycle'		: 1,
	'bird'			: 2,
	'boat'			: 3,
	'bottle'		: 4,
	'bus'			: 5,
	'car'			: 6,
	'cat'			: 7,
	'chair'			: 8,
	'cow'			: 9,
	'diningtable'	: 10,
	'dog'			: 11,
	'horse'			: 12,
	'motorbike'		: 13,
	'person'		: 14,
	'pottedplant'	: 15,
	'sheep'			: 16,
	'sofa'			: 17,
	'train'			: 18,
	'tvmonitor'		: 19
}

For our use case, we’re detecting and counting cars. Because we’re talking about cars, we use class 6 as the index in our business logic later in this post.

Our input image shape is [1, 3, 512, 512]. These are the dimensions of the input image the model expects to be given:

Batch size – 1
Number of channels – 3
Width and height of the input image – 512, 512

Uploading the model artifacts

We need to upload the model artifacts to an Amazon Simple Storage Service (Amazon S3) bucket. The bucket name should have aws-panorama- in the beginning of the name. After downloading the model artifacts, we upload the ssd_512_resnet50_v1_voc.tar.gz file to the S3 bucket. To create your bucket, complete the following steps:

Download the model artifacts.
On the Amazon S3 console, choose Create bucket.
For Bucket name, enter a name starting with aws-panorama-.

Choose Create bucket.

You can view the object details in the Object overview section. The model URI is s3://aws-panorama-models-bucket/ssd_512_resnet50_v1_voc.tar.gz.

The business logic code

After we upload the model artifacts to an S3 bucket, let’s turn our attention to the business logic code. For more information about the sample developer code, see Sample application code. For a comparative example of code samples, see AWS Panorama People Counter Example on GitHub.

Before we look at the full code, let’s look at a skeleton of the business logic code we use:

### Lambda skeleton

class car_counter(object):
    def interface(self):
        # defines the parameters that interface with other services from Panorama
        return

    def init(self, parameters, inputs, outputs):
        # defines the attributes such as arrays and model objects that will be used in the application
        return

    def entry(self, inputs, outputs):
        # defines the application logic responsible for predicting using the inputs and handles what to do
        # with the outputs
        return

The business logic code and AWS Lambda function expect to have at least the interface method, init method, and the entry method.

Let’s go through the python business logic code next.

import panoramasdk
import cv2
import numpy as np
import time
import boto3

# Global Variables 

HEIGHT = 512
WIDTH = 512

class car_counter(panoramasdk.base):
    
    def interface(self):
        return {
                "parameters":
                (
                    ("float", "threshold", "Detection threshold", 0.10),
                    ("model", "car_counter", "Model for car counting", "ssd_512_resnet50_v1_voc"), 
                    ("int", "batch_size", "Model batch size", 1),
                    ("float", "car_index", "car index based on dataset used", 6),
                ),
                "inputs":
                (
                    ("media[]", "video_in", "Camera input stream"),
                ),
                "outputs":
                (
                    ("media[video_in]", "video_out", "Camera output stream"),
                    
                ) 
            }
    
            
    def init(self, parameters, inputs, outputs):  
        try:  
            
            print('Loading Model')
            self.model = panoramasdk.model()
            self.model.open(parameters.car_counter, 1)
            print('Model Loaded')
            
            # Detection probability threshold.
            self.threshold = parameters.threshold
            # Frame Number Initialization
            self.frame_num = 0
            # Number of cars
            self.number_cars = 0
            # Bounding Box Colors
            self.colours = np.random.rand(32, 3)
            # Car Index for Model from parameters
            self.car_index = parameters.car_index
            # Set threshold for model from parameters 
            self.threshold = parameters.threshold
                        
            class_info = self.model.get_output(0)
            prob_info = self.model.get_output(1)
            rect_info = self.model.get_output(2)

            self.class_array = np.empty(class_info.get_dims(), dtype=class_info.get_type())
            self.prob_array = np.empty(prob_info.get_dims(), dtype=prob_info.get_type())
            self.rect_array = np.empty(rect_info.get_dims(), dtype=rect_info.get_type())

            return True
        
        except Exception as e:
            print("Exception: {}".format(e))
            return False

    def preprocess(self, img, size):
        
        resized = cv2.resize(img, (size, size))
        mean = [0.485, 0.456, 0.406]  # RGB
        std = [0.229, 0.224, 0.225]  # RGB
        
        # converting array of ints to floats
        img = resized.astype(np.float32) / 255. 
        img_a = img[:, :, 0]
        img_b = img[:, :, 1]
        img_c = img[:, :, 2]
        
        # Extracting single channels from 3 channel image
        # The above code could also be replaced with cv2.split(img)
        # normalizing per channel data:
        
        img_a = (img_a - mean[0]) / std[0]
        img_b = (img_b - mean[1]) / std[1]
        img_c = (img_c - mean[2]) / std[2]
        
        # putting the 3 channels back together:
        x1 = [[[], [], []]]
        x1[0][0] = img_a
        x1[0][1] = img_b
        x1[0][2] = img_c
        x1 = np.asarray(x1)
        
        return x1
    
    def get_number_cars(self, class_data, prob_data):
        
        # get indices of car detections in class data
        car_indices = [i for i in range(len(class_data)) if int(class_data[i]) == self.car_index]
        # use these indices to filter out anything that is less than self.threshold
        prob_car_indices = [i for i in car_indices if prob_data[i] >= self.threshold]
        return prob_car_indices

    
    def entry(self, inputs, outputs):        
        for i in range(len(inputs.video_in)):
            stream = inputs.video_in[i]
            car_image = stream.image

            # Pre Process Frame
            x1 = self.preprocess(car_image, 512)
                                    
            # Do inference on the new frame.
            
            self.model.batch(0, x1)        
            self.model.flush()
            
            # Get the results.            
            resultBatchSet = self.model.get_result()
            class_batch = resultBatchSet.get(0)
            prob_batch = resultBatchSet.get(1)
            rect_batch = resultBatchSet.get(2)

            class_batch.get(0, self.class_array)
            prob_batch.get(1, self.prob_array)
            rect_batch.get(2, self.rect_array)

            class_data = self.class_array[0]
            prob_data = self.prob_array[0]
            rect_data = self.rect_array[0]
            
            
            # Get Indices of classes that correspond to Cars
            car_indices = self.get_number_cars(class_data, prob_data)
            
            try:
                self.number_cars = len(car_indices)
            except:
                self.number_cars = 0
            
            # Visualize with Opencv or stream.(media) 
            
            # Draw Bounding boxes on HDMI output
            if self.number_cars > 0:
                for index in car_indices:
                    
                    left = np.clip(rect_data[index][0] / np.float(HEIGHT), 0, 1)
                    top = np.clip(rect_data[index][1] / np.float(WIDTH), 0, 1)
                    right = np.clip(rect_data[index][2] / np.float(HEIGHT), 0, 1)
                    bottom = np.clip(rect_data[index][3] / np.float(WIDTH), 0, 1)
                    
                    stream.add_rect(left, top, right, bottom)
                    stream.add_label(str(prob_data[index][0]), right, bottom) 
                    
            stream.add_label('Number of Cars : {}'.format(self.number_cars), 0.8, 0.05)
        
            self.model.release_result(resultBatchSet)            
            outputs.video_out[i] = stream
        return True


def main():
        
    car_counter().run()
main()

For a full explanation of the code and the methods used, see the AWS Panorama Developer Guide.

The code has the following notable features:

car_index – 6
model_used – ssd_512_resnet50_v1_voc (parameters.car_counter)
add_label – Adds text to the HDMI output
add_rect – Adds bounding boxes around the object of interest
Image – Gets the NumPy array of the frame read from the camera

Now that we have the code ready, we need to create a Lambda function with the preceding code.

On the Lambda console, choose Functions.
Choose Create function.
For Function name, enter a name.
Choose Create function.

Rename the Python file to car_counter.py.

Change the handler to car_counter_main.

In the Basic settings section, confirm that the memory is 2048 MB and the timeout is 2 minutes.

On the Actions menu, choose Publish new version.

We’re now ready to create our application and deploy to the device. We use the model we uploaded and the Lambda function we created in the subsequent steps.

Creating the application

To create your application, complete the following steps:

On the AWS Panorama console, choose My applications.
Choose Create application.

Choose Begin creation.

For Name, enter car_counter.
For Description, enter an optional description.
Choose Next.

Click Choose model.

For Model artifact path, enter the model S3 URI.
For Model name¸ enter the same name that you used in the business logic code.
In the Input configuration section, choose Add input.
For Input name, enter the input Tensor name (for this post, data).
For Shape, enter the frame shape (for this post, 1, 3, 512, 512).

Choose Next.
Under Lambda functions, select your function (CarCounter).

Choose Next.
Choose Proceed to deployment.

Deploying your application

To deploy your new application, complete the following steps:

Choose Choose appliance.

Choose the appliance you created.
Choose Choose camera streams.

Select your camera stream.

Choose Deploy.

Checking the output

After we deploy the application, we can check the output HDMI output or use Amazon CloudWatch Logs. For more information, see Setting up the AWS Panorama Appliance Developer Kit or Viewing AWS Panorama event logs in CloudWatch Logs, respectively.

If we have an HDMI output connected to the device, we should see the output from the device on the HDMI screen, as in the following screenshot.

And that’s it. We have successfully deployed a car counting use case to the AWS Panorama Appliance.

Extending the solution

We can do so much more with this application and extend it to other parking-related use cases, such as the following:

Parking lot routing – Where are the vacant parking spots?
Parking lot monitoring – Are cars parked in appropriate spots? Are they too close to each other?

You can integrate these use cases with other AWS services like QuickSight, S3 buckets, and MQTT, just to name a few, and get real-time inference data for monitoring cars in a parking lot.

You can adapt this example and build other object detection applications for your use case. We will also continue to share more examples with you so you can build, develop, and test with the AWS Panorama Appliance Developer Kit.

Conclusion

The applications of computer vision at the edge are only now being imagined and built out. As a data scientist, I’m very excited to be innovating in lockstep with AWS Panorama customers to help you ideate and build CV models that are uniquely tailored to solve your problems.

And we’re just scratching the surface of what’s possible with CV at the edge and the AWS Panorama ecosystem.

Resources

For more information about using AWS Panorama, see the following resources:

GitHub examples – Introduction to AWS Panorama
Sending output to an S3 bucket – AWS SDK for Python (Boto3)
Sending MQTT messages – Using the AWS IoT MQTT topic
Setting up the AWS Panorama Appliance Developer Kit – Register and configure the developer kit
Setting up a camera stream – Add a camera stream

About the Author

Surya Kari is a Data Scientist who works on AI devices within AWS. His interests lie in computer vision and autonomous systems.

AWS Machine Learning Blog

Building and deploying an object detection computer vision application at the edge with AWS Panorama

Parking lot car counter application

Computer vision model

Uploading the model artifacts

The business logic code

Creating the application

Deploying your application

Checking the output

Extending the solution

Conclusion

Resources

About the Author

Resources

Blog Topics

Follow