Optimize workforce in your store using Amazon Rekognition

April 2023 Update: Starting January 31, 2024, you will no longer be able to access AWS DeepLens through the AWS management console, manage DeepLens devices, or access any projects you have created. To learn more, refer to these frequently asked questions about AWS DeepLens end of life.

In this post, we show you how to use Amazon Rekognition and AWS DeepLens to detect, and analyze occupancy in a retail business to optimize workforce utilization. Retailers often need to make decisions to improve the in-store customer experience through personnel management. Having too few or too many employees working can be detrimental to the business by decreasing the customer experience as well as store revenue. When store traffic outpaces staffing, it results in long checkout lines and limited customer interface, creating a poor customer experience. The opposite can be true as well by having too many employees during periods of low traffic, which generates wasted operating costs. If the consumer is already in line with the product, losing a sale due to lack of efficiency can be detrimental to your business.

Currently, the most efficient, economical, and accurate method to count the flow of people in a store is by using a small handheld mechanical device that contains a manual clicker, called a tally counter. An employee, whose only job and purpose is to stay at the front door, pushes the manual clicker every time a person enters the store. The device itself is cheap and simple to use; however, it requires a dedicated staff member to stand at the store entrance around the clock and click it each time a customer walks in. This has an obvious downside: it increases labor costs by having an employee dedicated to keeping count. On top of this, if the retailer wants to analyze the data gathered to measure the customer flow in the store, this data needs to be input manually into third-party software, which adds to the cost and decreases ease of use. There are many other solutions on the market for retailers, but all alternatives to the tally counter require dedicated hardware to be installed in the store, which is costly and time-consuming.

Our solution does this with AWS Deeplens but later on the post we also show you how to allow retailers to analyze the flow of people in the store at any given time by simply adding computer vision (CV) to existing cameras in the store, with no extra dedicated hardware required. The proposed solution automates the counting process, allows data to be collected and analyzed, and decreases operational cost.

You can use the same technology demonstrated in this post to create additional business analytic tools such as heatmap, to determine the distribution of popular products in your store.

Additionally, knowing the high and low occupancy hours allows the retailer to calculate their corresponding sales conversion rate (sales conversion rate = people making a purchase/number of visitors). This process is done automatically and the data is gathered in a database for the retailer to analyze.

Amazon Rekognition is a fully managed service that provides CV capabilities for analyzing images and video at scale, using deep learning technology without requiring machine learning (ML) expertise. AWS DeepLens is a deep-learning enabled video camera.

Solution overview

To help you track, detect, and analyze the flow of customers in a store, we use ML and serverless technologies through AWS services (see the following architectural diagram).

We initially deploy an AWS Lambda function to AWS DeepLens. This function is responsible for sending frames to an Amazon Simple Storage Service (Amazon S3) bucket. When a frame is put into the S3 bucket, an S3 bucket event triggers a second Lambda function. This second function analyzes the frame using Amazon Rekognition and counts the number of people in the frame. It puts this number and the timestamp into an Amazon DynamoDB table for further analytics use. We look into each component in more detail later in this post.

We start with walking you through setting up this system using an AWS DeepLens camera, but you can also integrate the solution with existing IP cameras, and we discuss how to modify our solution to make that happen.

Prerequisites

You need to set up a few prerequisites before running the solution code in AWS Lambda:

Create an S3 bucket. Record the bucket name; you need to input this information later in the function code.
Create a DynamoDB table. Give your table a name, enter TimeStamp in the partition key box, and set the data type to String.
Create an AWS Identity and Access Management (IAM) policy with the least privilege to access Amazon S3, Amazon Rekognition, and Amazon DynamoDB. Use the following JSON code for your policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "rekognition:DetectLabels",
                "s3:GetObject",
                "dynamodb:PutItem"
            ],
            "Resource": "*"
        }
    ]
}

Remember the policy name; you use it in the next step.

Create a role for the Lambda function and attach the policy you created in the previous step to grant the function the proper permissions. After you create the role, record the role name. You need to attach this role later when you create the Lambda function.

Let’s get started.

Create an AWS DeepLens project

The first step of the project is adding the logic necessary to send frames from the camera encoder into Amazon S3. This simulates the ability that IP cameras have to adjust the intra frame period on the stream of the camera itself.

On the AWS DeepLens console, register your device.

Now we need to deploy the model onto the camera itself.

In the navigation pane, under Resources, choose Projects.
Choose Create project.
Select Use a project template.
Choose the pre-built object detection template.

This template uses the detect object model, which can detect real-life entities in a picture.

Choose Finish.

Modify the Lambda function

In this next step, you update the code of your Lambda function.

When the project is ready, choose the name of the project to see the details.

You can see the name, description, ARN, and the associated Lambda function and model for this project.

Choose the function to modify it.

A new window opens to redirect you to the Lambda console.

Verify that you’re on the $LASTEST version; then you can modify the code.
Enter the following code into your function:

## This is the function that been deployed to AWS Deeplens. It capture the frame and send to s3 bucket
#*****************************************************
#                                                    *
# Copyright 2018 Amazon.com, Inc. or its affiliates. *
# All Rights Reserved.                               *
#                                                    *
#*****************************************************
""" A sample lambda for face detection"""
from threading import Thread, Event
import os
import json
import numpy as np
import awscam
import cv2
import greengrasssdk
import boto3
import time
import datetime

##Set up the parameters here
Bucket_name = 'your_s3_bucket_name' ##Your s3 bucket name. The frames from the camera will be stored here
frame_send_frequency=300 ##Set up the time in second to decide how often do you want the deeplens to send frames to s3

class LocalDisplay(Thread):
    """ Class for facilitating the local display of inference results
        (as images). The class is designed to run on its own thread. In
        particular the class dumps the inference results into a FIFO
        located in the tmp directory (which lambda has access to). The
        results can be rendered using mplayer by typing:
        mplayer -demuxer lavf -lavfdopts format=mjpeg:probesize=32 /tmp/results.mjpeg
    """
    def __init__(self, resolution):
        """ resolution - Desired resolution of the project stream """
        # Initialize the base class, so that the object can run on its own
        # thread.
        super(LocalDisplay, self).__init__()
        # List of valid resolutions
        RESOLUTION = {'1080p' : (1920, 1080), '720p' : (1280, 720), '480p' : (858, 480)}
        if resolution not in RESOLUTION:
            raise Exception("Invalid resolution")
        self.resolution = RESOLUTION[resolution]
        # Initialize the default image to be a white canvas. Clients
        # will update the image when ready.
        self.frame = cv2.imencode('.jpg', 255*np.ones([640, 480, 3]))[1]
        self.stop_request = Event()

    def run(self):
        """ Overridden method that continually dumps images to the desired
            FIFO file.
        """
        # Path to the FIFO file. The lambda only has permissions to the tmp
        # directory. Pointing to a FIFO file in another directory
        # will cause the lambda to crash.
        result_path = '/tmp/results.mjpeg'
        # Create the FIFO file if it doesn't exist.
        if not os.path.exists(result_path):
            os.mkfifo(result_path)
        # This call will block until a consumer is available
        with open(result_path, 'w') as fifo_file:
            while not self.stop_request.isSet():
                try:
                    # Write the data to the FIFO file. This call will block
                    # meaning the code will come to a halt here until a consumer
                    # is available.
                    fifo_file.write(self.frame.tobytes())
                except IOError:
                    continue

    def set_frame_data(self, frame):
        """ Method updates the image data. This currently encodes the
            numpy array to jpg but can be modified to support other encodings.
            frame - Numpy array containing the image data tof the next frame
                    in the project stream.
        """
        ret, jpeg = cv2.imencode('.jpg', cv2.resize(frame, self.resolution))
        if not ret:
            raise Exception('Failed to set frame data')
        self.frame = jpeg

    def join(self):
        self.stop_request.set()

def infinite_infer_run():
    """ Entry point of the lambda function"""
    try:
        # This face detection model is implemented as single shot detector (ssd).
        model_type = 'ssd'
        output_map = {1: 'face', 2: 'person'}
        # Create an IoT client for sending to messages to the cloud.
        client = greengrasssdk.client('iot-data')
        iot_topic = '$aws/things/{}/infer'.format(os.environ['AWS_IOT_THING_NAME'])
        # Create a local display instance that will dump the image bytes to a FIFO
        # file that the image can be rendered locally.
        local_display = LocalDisplay('480p')
        local_display.start()
        # The sample projects come with optimized artifacts, hence only the artifact
        # path is required.
        model_path = '/opt/awscam/artifacts/mxnet_deploy_ssd_resnet50_300_FP16_FUSED.xml'
        # Load the model onto the GPU.
        client.publish(topic=iot_topic, payload='Loading face detection model')
        model = awscam.Model(model_path, {'GPU': 1})
        client.publish(topic=iot_topic, payload='Face detection model loaded')
        # Set the threshold for detection
        detection_threshold = 0.25
        # The height and width of the training set images
        input_height = 300
        input_width = 300
        # Do inference until the lambda is killed.
        while True:
            # Get a frame from the video stream
            ret, frame = awscam.getLastFrame()
            if not ret:
                raise Exception('Failed to get frame from the stream')
                
                
            try:
                s3 = boto3.client('s3')
                encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 90]
                _, jpg_data = cv2.imencode('.jpg', frame, encode_param)
                timestamp = datetime.datetime.now()
                # Set up the name for each frame
                key = "frame/{}.jpg".format(timestamp)
                
                # Put frame into S3 bucket
                response = s3.put_object(Body=jpg_data.tostring(),
                                         Bucket=Bucket_name,
                                         Key=key,
                                         ContentType='image/jpeg'
                                         )
                client.publish(topic=iot_topic, payload="Frame pushed to S3")
                time.sleep(frame_send_frequency)
            except Exception as e:
                msg = "Pushing to S3 failed: " + str(e)
                client.publish(topic=iot_topic, payload=msg)
      
    except Exception as ex:
        client.publish(topic=iot_topic, payload='Error in face detection lambda: {}'.format(ex))

infinite_infer_run()

In the preceding code, we use the awscam.getLastFrame() API request to get the last frame produced by the AWS DeepLens camera. Then we send each of those produced frames to Amazon S3.

We need to make sure we set two main parameters:

The name of the bucket that you created as a prerequisite. This parameter is on line 21 of the preceding code:

Bucket_name = 'your_s3_bucket_name' ##Your s3 bucket name

The frequency that we want to receive the frame from the camera. This is currently set to 300 in the example code and is measured in seconds, meaning that we receive 1 frame per 5 minutes from the camera. For your project, you can adjust and modify the frequency depending on your needs. If you want to analyze frames more often, decrease the frame_rate_frequency When changing this variable, we recommend that you select a frequency from a range of 1–840 seconds. Our recommendation takes into account the runtime limitation of Lambda function, which is 15 minutes. To modify the frequency parameter, go to line 22 of the code:

frame_send_frequency=300

Choose Save to save your function.

Because you made a modification, you need to publish this new version, which is deployed into the camera.

On the Actions menu, choose Publish a new version.

A new window appears asking you to describe the version (this step is optional).

Choose Publish.

Now the modified code is the published $LASTEST Lambda version.

Deploy the updated project into your AWS DeepLens device

To update the changes we made to the code into the device, we need to deploy the project one more time. But before deploying, we need to update the project with the $LASTEST version of our function.

On the AWS DeepLens console, in the navigation pane, under Resources, choose Project.
Choose the project you created to see its details.

The Lambda function description still shows the outdated version of the function.

Choose Edit.

A new window opens.

Choose Function to expand the options.
Choose the newest version of your function.
For Timeout, modify the timeout according to the frame_send_frequency parameter you set previously.

For example, if your frame_send_frequency variable is set to 300 seconds, your Lambda function timeout setting should be greater than 300, so that the function doesn’t time out before sending the last frame.

Choose Save.

After saving your project, you’re redirected to the Projects section of the AWS DeepLens console.

Choose your project and choose Deploy to device.
Choose the device you want to deploy this project to.
Choose Review.

On the review page, you see a warning, which you can ignore.

Review the information and choose Deploy.

You’re redirected to your device detail pane, where you can see the progression and status of the deployment.

After the deployment is successful, you’re ready for the next section of the project. So far you created, modified, and deployed a project into your AWS DeepLens camera. Now the camera should be sending the frames to the S3 bucket at the frequency you set in the code. For this post, we set the frequency to 300, and should receive one frame every 5 minutes in the S3 bucket. The next step is to analyze those video frames.

Analyze frames with AWS Rekognition and send data to DynamoDB

To analyze the frames in the S3 bucket, we have to create another Lambda function.

Create your function and choose Python 3.8 as the runtime.
In the Permissions section, select Use an existing role.
Choose the role you created as a prerequisite.
Complete creating your function.

The creation process takes a few seconds to complete.

When the function is ready, on the py tab of the function code, enter the following:

##This function trigger by s3 put event and get frame from s3 to count how many people in the picture 
##The time when the frame is taken and the number of people at that time will be sent to DynamoDB
import json
import urllib
import boto3
import time
import os

##Set up the parameters here
DBTable_name = 'your_DynamoDB_Table_name' ##Your DynamoDB table name. The number of people counted in each frame will be stored here

s3 = boto3.client('s3')
rekognition = boto3.client('rekognition')
dynamodb = boto3.client('dynamodb')


def lambda_handler(event, context):
    bucketName = event['Records'][0]['s3']['bucket']['name']
    imageName = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    
    # Get frame from S3 bucket
    response=s3.get_object(Bucket=bucketName, Key=imageName)
    
    # Call rekognition function to find person in the frame
    recognizeLabelResponse = rekognition.detect_labels(
    Image={
        'S3Object': {
            'Bucket': bucketName,
            'Name': imageName,
        }
    },
    MinConfidence=80,
    )
    
    threshold = 1
    time_stamp = str(imageName)
    
    # count number of people in the picture
    pplno = 0
    for item in recognizeLabelResponse['Labels']:
        if (item["Name"] == "Person") :
            for person in item["Instances"]:
                confidence = person["Confidence"]
                if(confidence > 95):
                    pplno = pplno+1
                    
    writetotable = dynamodb.put_item(
        TableName = DBTable_name,
        Item={
            'TimeStamp':{'S':time_stamp},
            'Customer_Number':{'N':str(pplno)}
            }
        )

The preceding code analyzes the frames in the S3 bucket, uses Amazon Rekognition to count the number of people, and creates a record in your DynamoDB table with two attributes: TimeStamp and the number of customers.

At line 10, provide the name of your DynamoDB table that you set up as a prerequisite:

DBTable_name = ‘your_DynamoDB_Table_name’,

Choose Deploy to save the function.

We now need to set up an S3 event trigger to run this Lambda function.

On the Amazon S3 console, choose the bucket you created as a prerequisite.
Choose Properties.
In the Advanced settings section, choose Events.
Choose Add notification.
Enter a name for your event trigger and select All object create events.
For Send to, choose Lambda function.
For Lambda, enter the name of the function you just created.
Choose Save.

Now you’re all set.

Enhance the solution

You can build on this solution in different ways to make it more suitable for your own business needs; we cover a few steps you can take in this section.

Connect existing IP cameras to AWS

You can integrate this solution with existing IP cameras. The only difference from the steps described in this post is the connection of the camera itself with the AWS Cloud versus using AWS DeepLens.

Amazon Kinesis Video Streams is a service that sends streaming videos to the AWS Cloud, which you can use for real-time video processing or batch-oriented video analytics. To implement this, you need to make sure that the IP camera meets all the hardware requirements to integrate with the AWS Cloud. For more information about connecting to Kinesis Video Streams, see the Quick Start Reference Deployment or Kinesis Video Streams Producer Libraries.

Bring computer vision at the edge for existing IP cameras

This alternative announced at re:Invent 2020 provides a hardware appliance device called AWS Panorama, which allows you to add CV to cameras that weren’t originally built to handle embedded computer vision modules.

AWS Panorama is an ML Appliance and Software Development Kit (SDK) that allows you to bring CV on premises to make predictions locally with high accuracy and low latency. You can sign up for a preview.

Build a visualization dashboard

You can visualize the data you collect with Amazon QuickSight, a scalable, serverless, embeddable, ML-powered business intelligence (BI) service built for the cloud. You can use QuickSight to query all the data stored in your DynamoDB table either directly on the QuickSight console or embedded in your website or application. For more information, see How to perform advanced analytics and build visualizations of your Amazon DynamoDB data by using Amazon Athena.

Conclusion

This solution is just one example of the capabilities of Amazon Rekognition that you can use to support workforce optimization. Knowing the average number of customers allows you to unlock the full potential of your store by optimizing the number of employees working at any given time.

You can further develop and add features, such as heatmap analysis to analyze customer trajectory within the store to position your products inside the most valuable areas of your space to increase sales.

No customer personal identity information is collected in this solution. Each video frame is temporarily stored in Amazon S3 and deleted after counting. You can encrypt your data with both server-side encryption and client-side encryption. The information extracted from the image is stored in Amazon DynamoDB, which is fully encrypted at rest using encryption keys stored in AWS Key Management Service (AWS KMS). You can also control the user access to the data by using AWS Identity and Access Management (IAM).

About the Authors

Kayla Jing is a Solutions Architect at Amazon Web Services based out of Seattle. She helps customers optimize architectures on AWS with a focus on Data Analytics and Machine Learning. Kayla holds a Master’s degree in Data Science.

Laura Reith is a Solutions Architect at Amazon Web Services. Before AWS, she worked as a Solutions Architect in Taiwan focusing on physical security and retail analytics. Laura has Master’s degree in Electrical Engineering and Computer Science.