Training the Amazon SageMaker object detection model and running it on AWS IoT Greengrass – Part 1 of 3: Preparing training data

Post by Angela Wang and Tanner McRae, Engineers on the AWS Solutions Architecture R&D and Innovation team

Running computer vision algorithms at the edge unlocks many industry use cases that has low or limited internet connectivity. Combining services from AWS in the Machine Learning (ML) and Internet of Things (IoT) space, training a custom computer vision model and running it at the edge has become easier than ever. The blog post Using and Retraining Image Classification Models with AWS IoT Greengrass shows an example of running image classification models on AWS IoT Greengrass. In computer vision, image classification tells you what type of objects are in the image. Object detection, in addition to defining objects, also tells you where the objects are by producing bounding boxes that mark the location of each object being detected.

Image with person moving 2 boxes, with bounding boxes surrounding each box. Image with person moving a box, with a bounding box highlight surrounding the box.

In industries such as manufacturing and supply chain, the capability of object detection to locate objects makes it applicable to a wider range of use cases compared to image classification alone. For example:

Track and count inventory
Identify barcodes or QR codes printed on top of boxes
Locate product defects for quality control or predictive maintenance

When running object detection models for use cases such as inventory tracking, it is common to have limited connectivity to the internet. Yet in order to count and track inventory accurately, it’s often necessary to process multiple frames per second per camera. This can be a lot of data to send over the internet if you run the ML inference in the cloud. Therefore, running the object detection model at the edge can often be a low-latency and reliable solution. Using the ML inference feature of AWS IoT Greengrass can simplify managing and deploying ML inference to the edge.

These use cases also tend to require training a specialized object detection model using custom data. This enables the model to recognize specific products in backgrounds that are unique to the customer’s business. Amazon SageMaker is built to accelerate the process by providing a built-in workflow for data labeling and a built-in object detection algorithm.

Even though tools like Amazon SageMaker and AWS IoT Greengrass do an excellent job in their own domains, there is still work to be done for an end-to-end solution:

How do you take a source video feed and turn it into labeled training data?
How do you make sure that you have the right collection of data before you pay people to label them?
How do you convert the output of the Amazon SageMaker object detection algorithm model artifact into a format that you can deploy using AWS IoT Greengrass for inference?

In this multipart post series, we share the processes, scripts, and best practices that we used to address these questions while working on similar object detection projects with customers. The example of tracking two types of boxes going in and out of a room with an overhead camera is a simplified version of tracking objects on a factory floor.

Prerequisites

You are welcome to read the best practices described in this post. However, if you would like to try out each step yourself, make sure that you have the following in place:

An AWS account
An Amazon S3 bucket
A running Amazon SageMaker notebook instance
This GitHub repository cloned to the Amazon SageMaker notebook instance
If you collect your own data, a web camera connected to your laptop and objects on which to train the custom model

Note: Make sure that when creating the IAM role for the Jupyter notebook, the IAM role can access the greengrass-object-detection-blog bucket and the S3 bucket that you created for this project.

Architecture

In the next few posts, we walk through each part of the following architecture, as shown in the diagram:

Architecture diagram for the blog

This post takes a closer look at the data gathering for training your own object detection model using a web camera.

Gather video footage

To construct an environment similar to an inventory tracking use case on a factory floor, we installed a web camera on the ceiling of an office. Then we connected it to the computer through USB.

After the setup was complete, we took videos of ourselves carrying these boxes around under the webcam using this 00_get_video.py script. The script uses the OpenCV library, an open source BSD-licensed library of programming functions built for real-time computer vision. The library is cross-platform and contains a wealth of algorithms and video and image-processing functionalities. For this use case, we used OpenCV to capture videos from a web camera and to extract frames from the videos.

To use this script to gather your own footage, run the script on your laptop after installing Python dependencies. (Press q to stop the recording.)

pip install -r data-prep/requirements.txt
python data-prep/00_get_video.py -n <name-of-video> -c <camera-id>

After you record the videos, upload them to an S3 bucket using the aws s3 sync tool.

Try to gather footage on as many different scenarios that can happen in production as possible so that your model is trained on varying environments and still performs well if anything changes. For example, try different lighting, room configuration, inventory movement patterns, and so on.

You can review the video gathering Python script here.

Instructions on following these steps in your AWS account

You can perform each of the following steps by running through this Juypter notebook on your Amazon SageMaker notebook instance (with the exception of creating the Amazon SageMaker Ground Truth labeling job). You can use either the video that you collected yourself or using example video files provided by us (released under the CDLA Permissive license).

Extracting and uploading frames

Now that you have collected some videos, you must extract individual frames from them so they can be labeled and used for training. You can run the 01_video_to_frame_utils.py script for each of the videos you collected on an Amazon SageMaker notebook instance:

python data-prep/01_video_to_frame_utils.py --video_s3_bucket $VIDEO_S3_BUCKET --video_s3_key $VIDEO_S3_KEY --working_directory $WORKDING_DIR --visualize_video True --visualize_sample_rate 1 -o $OUTPUT_S3_BUCKET

After you extract the frames, use s3 sync to upload them to S3:

aws s3 sync $WORKDING_DIR/$folder_name s3://$OUTPUT_S3_BUCKET/frames/

Frame extraction tips

Tip 1: Prefix extracted frames with class annotation

During data labeling using SageMaker Ground Truth for object detection, the worker typically need to select the class the object belongs to in addition to drawing the bounding box. However, if you recorded videos for each product you want to detect separately, you could include the name of the item in the name of the video file. Then during frame extraction, you can carry that name over to the image file names, which gives you a free class annotation. See the following diagram for an example.

example diagram showing extract frame file names with class name prefix

Tip 2: Review the contents of your extracted frames for image quality, personally identifiable information (PII), confidential data, and background-only images

Before you begin labeling your training data, make sure to review the contents of your extracted frames for image quality, PII, confidential data, or background-only images. Consider retaking the video, or apply additional filtering steps for empty frames or frames with PII. The frame extraction script (01_video_to_frame_utils.py) automatically generates a thumbnail like the following for you:

Build a labeling manifest file for Amazon SageMaker Ground Truth

To train an ML model, you need large, high-quality, labeled datasets. Labeling for thousands of images can become tedious and time consuming. Thankfully, Amazon SageMaker Ground Truth makes it easy to crowdsource this task. The Ground Truth service offers easy access to public and private human labelers for annotating datasets. It provides built-in workflows and interfaces for common labeling tasks, including drawing bounding boxes for object detection.

When creating a labeling job in Amazon SageMaker Ground Truth, you create a manifest file pointing to the locations of the input images stored in S3 that require annotation. Each line corresponds to a single image and is an independent JSON document.

Use the 02_generate_gt_manifest.py script to generate this manifest by specifying the S3 location of the frames you have extracted in the previous step:

python data-prep/02_generate_gt_manifest.py -b $S3_BUCKET -k $S3_KEY_PREFIX -d $WORKING_DIR -r $SAMPLING_RATE

After reviewing the content of the generated manifest JSON file, upload it to an S3 bucket of your choosing. Use this location in the next step when creating the Ground Truth job. You can also choose to join together manifest files generated from multiple videos before uploading it to S3.

SageMaker Ground Truth manifest generation tips

Tip 1: Append additional metadata to your SageMaker Ground Truth labeling manifest file

A cool feature of SageMaker Ground Truth is the ability to attach additional metadata associated with each input. This metadata is preserved as part of the labeled output of the Ground Truth as passthrough information. For example, you can append the “color” and “object” values to each image in the manifest that was previously stored in the image’s file name.

{"source-ref": "s3://my-bucket/frames/blue_box_1/blue_box_1_000023.jpg", "color": "blue", "object": "box"}
{"source-ref": "s3://my-bucket/frames/blue_box_1/blue_box_1_000025.jpg", "color": "blue", "object": "box"}
{"source-ref": "s3://my-bucket/frames/yellow_box_1/yellow_box_1_000019.jpg", "color": "yellow", "object": "box"}
{"source-ref": "s3://my-bucket/frames/yellow_box_1/yellow_box_1_000020.jpg", "color": "yellow", "object": "box"}

After the image has been annotated by SageMaker Ground Truth workers, here’s what the output line looks like in the output manifest file:

{
 "source-ref":"s3://my-bucket/frames/blue_box_1/blue_box_1_000023.jpg",
 "color": "blue", 
 "object": "box",
 "bb":{
   "annotations":[{"class_id":0,"width":499,"top":134,"height":726,"left":0}],
   "image_size":[{"width":1280,"depth":3,"height":1080}]
 },
 "bb-metadata":{
   "job-name":"labeling-job/demo",
   "class-map":{"0":"storage box"},
   "human-annotated":"yes",
   "objects":[
     {"confidence":0.09}
   ],
   "creation-date":"2019-05-03T22:33:23.351336",
   "type":"groundtruth/object-detection"
 }
}

As you can see, the bounding box annotations labeled by SageMaker Ground Truth workers is added, but the additional metadata (the color and object key-value pairs) isn’t touched at all. In this example, this additional metadata helps generate class labels used in training.

This can also be useful in cases of labeling jobs chaining. For example, you might run a classification job first to classify what’s in an image. You could take that output and pass it into a new bounding box job. That effectively chains the two jobs together into a single output that you can use to train an object detection model with.

To see example code of appending this metadata, find the following line in the manifest generation script and uncomment the following line:

# to see appending additional metadata in action, uncomment the following line if you are using the example data
obj = append_additional_metadata(obj, s3_object.key)

Tip 2: Decide what at interval to sample your frames

Also consider a frame sampling rate when generating your labeling job manifest. Why should you sample your frames? Say that you have about 10 hours of footage taken at 30 FPS (frames per second). That’s a total of 10 x 60 x 60 x 30 = 1,080,000 images. It’s not only expensive and time consuming to label every frame, but you might also find that consecutive frames are nearly identical, especially if your video has a high frame rate.

For example, here’s an example of 16 consecutive frames from our 5 FPS footage:

16 consecutive frames of a video of person moving a box

And here’s 16 frames that were sampled 1 out of every 10 frames from our footage:

16 frames of a video sampled 1 out of every 10 frames

The the manifest generation script has a command line option so you can specify the sampling frequency when generating the labeling manifest.

Before submitting a labeling job to SageMaker Ground Truth, review the content of the manifest you generated by using the 03_visualize_gt_labeling_manifest.py script:

python data-prep/03_visualize_gt_labeling_manifest.py -b $S3_BUCKET -k $S3_KEY_MANFIST

Create a data labeling job in Amazon SageMaker Ground Truth

You can now move on to creating labeling jobs in Amazon SageMaker Ground Truth. In this post, we don’t cover each step in creating a labeling job in Ground Truth. It’s already covered in detail by this great post, Amazon SageMaker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%.

SageMaker Ground Truth job submission tip: Follow an iterative process in writing concise and clear instructions

We followed the recommended workflow from the Create high-quality instructions for Amazon SageMaker Ground Truth labeling jobs post to create the following instructions for the labeler:

Sagemaker ground truth labeler portal screenshot that displays the image to label and instructions on how to label the image

Make sure you include bad examples in your instructions and remind labelers what not to do when labeling:

Sagemaker ground truth labeler portal screenshot showing instructions on bad examples of labels

When you iterate on instructions for a SageMaker Ground Truth job, you can either use the console or use the boto3 SDK. If it’s your first time using the service, we suggest that you use the SageMaker Ground Truth console. For iterating the custom instructions, we found it easier to directly edit HTML and work through a Jupyter notebook. Here’s the Jupyter notebook we used to create custom instructions for submitting labeling jobs to Ground Truth. Feel free to run it on your Amazon SageMaker notebook instance and modify the parameters to suit your project.

Reviewing labels from SageMaker Ground Truth

You can monitor the SageMaker Ground Truth console for the completion of your labeling jobs.

Ground truth console screenshot showing a list of completed labeling jobs

SageMaker Ground Truth result review tips:

Take advantage of the bounding box visualizations in the console to review the label quality and iterate on your labeling job configurations.
If you use multiple workers for labeling jobs, SageMaker Ground Truth automatically performs annotation consolidation to join together multiple workers’ output. Different confidence scores are assigned to each bounding box, depending on how many workers have labeled that same area. Experiment with filtering out labels with low confidence scores and visualize the results in your own scripts.

Conclusion

In this post, we shared with you tips for preparing training data for a real-time object detection IoT use case. You can find all the code that we covered at this GitHub repository. To follow along each step in your own account, run through the data prep tutorial Juypter notebook on an Amazon SageMaker notebook instance.

In part 2 of this blog series, we show you how to take the output from Amazon SageMaker Ground Truth and use it for training your custom object detection model. We will also go over the steps to convert the output to be ready to deploy for AWS IoT Greengrass.

The Internet of Things on AWS – Official Blog