Training the Amazon SageMaker object detection model and running it on AWS IoT Greengrass – Part 2 of 3: Training a custom object detection model

Post by Angela Wang and Tanner McRae, Engineers on the AWS Solutions Architecture R&D and Innovation team

This post is the second in a series on how to build and deploy a custom object detection model to the edge using Amazon SageMaker and AWS IoT Greengrass. In part 1 of this series, we walked through the training data preparation process: capturing video, extracting and selecting frames, and using Amazon SageMaker Ground Truth to label the images.

In this post, you prepare the output from the SageMaker Ground Truth labeling job for model training, submit a custom object detection training job, and finally, convert the output model artifact for making inference locally.

Here’s a reminder of the architecture you are building as a whole:

architecture diagram for the blog post

Following the steps in your AWS account

If you would like to try out each step yourself, you can run through the Jupyter notebooks in the GitHub project repository.

Completing part 1 of the series is not a prerequisite for following along in your account. If you have created your own Ground Truth job, you can use your own labeled data. If not, use the example labeled data that we provided (under the CDLA Permissive license). If you choose to use example data, make sure that you use the US East (Northern Virginia) Region.

Process SageMaker Ground Truth labeling job outputs for training

The output of the SageMaker Ground Truth bounding box labeling job is in a format called augmented manifest file. This format is accepted as input to an Amazon SageMaker built-in object detection training algorithm.

There are a few advantages of using the augmented manifest format compared to other input formats supported by Amazon SageMaker:

Unlike the traditional approach of providing paths to the input images separately from its labels, the augmented manifest file already combines both into one entry for each input image. This reduces complexity in algorithm code for matching each image with labels. For a more detailed explanation, see Easily train models using datasets labeled by Amazon SageMaker Ground Truth.
When splitting your dataset for train, validation, and test, it’s not necessary to rearrange and re-upload image files to different Amazon S3 prefixes for train and validation. After you upload your image files to S3, you never have to move it again. You can just place pointers to these images in your augmented manifest file for training and validation. See more on the train and validation data split in this post later.
When using an augmented manifest file, the training input images are loaded on to the training instance in Pipe mode. The input data is streamed directly to the training algorithm while it is running (as opposed to File mode, where all input files must be downloaded to disk before the training starts). This results in faster training performance and less disk resource utilization. For more information about the benefits of Pipe mode, see Accelerate model training using faster Pipe mode on Amazon SageMaker.
No format conversion is required if you are using Ground Truth to generate the data labels.

In this example project, even though the augmented manifest file format is supported as training input for Amazon SageMaker, you may still need a small amount of processing work:

Join together outputs from multiple labeling jobs: To be able to iterate on SageMaker Ground Truth jobs, you created several smaller labeling jobs for the dataset instead of a single large job containing the full dataset.
Filter out labels that did not meet the quality bar: Look through the labeled bounding boxes in the Ground Truth console and manually mark down any images that were not well labeled.
Inject the correct class labels: In part 1, you already prepended the name (class) of the object to the file names of the frames extracted from the video. You then added it as additional metadata in the SageMaker Ground Truth labeling manifest. Because of this, in the labeling job, you didn’t ask the labeler to pick the right class when it drew bounding boxes. This approach saved effort for the worker. However, it also means that the annotations in the Ground Truth output always have the same class_id of 0, because you only specified one class of objects when you submitted the labeling job. Here’s an example output from the completed labeling job:

{
 "source-ref": "s3://aws-greengrass-blog/frames/yellow_box_1/yellow_box_1_000022.jpg", 
 "color": "yellow", 
 "object": "box",
 "bb":{
   "annotations":[{"class_id":0,"width":499,"top":134,"height":726,"left":0}],
   "image_size":[{"width":1280,"depth":3,"height":1080}]
 },
 "bb-metadata":{
   "class-map":{"0":"storage box"},
   ...
 }
}

The code in this Jupyter notebook uses the additional metadata (color and object key pairs in the output JSON) to inject the correct class labels in the bounding box annotation. For example, the blue box with class ID 0, yellow box with class ID 1. It updates the class-map field in the manifest. The resulting entry looks something like this:

{
 "source-ref": "s3://aws-greengrass-blog/frames/yellow_box_1/yellow_box_1_000022.jpg",
 "color": "yellow", 
 "object": "box",
 "bb":{
   "annotations":[{"class_id":1,"width":499,"top":134,"height":726,"left":0}],
   "image_size":[{"width":1280,"depth":3,"height":1080}]
 },
 "bb-metadata":{
   "class-map": {"0":"blue box", "1":"yellow box"},
   ...
 }
}

4. Split the data into train and validation sets

Amazon SageMaker requires two datasets during training: a train and a validation dataset. The training set consists of the images and annotations used to actually train the model. The validation set is not used for training but to validate that the model can generalize (it can accurately make predictions on previously unseen data). It’s also used to compare accuracy between different trained models during hyperparameter tuning.

Why split the data this way? The process of model training is essentially reducing loss (how far off the model’s prediction is compared to the training dataset) in each training iteration. However, minimizing the loss runs the risk of making the model too tailored to the particularities of the training data (overfitting). For an illustration of this, use the following scatter plot of 2D points as an example training dataset:

scattor plot of points that follows a linear pattern

A model represented by the red line in the preceding plot does a passable job describing the underlying pattern of the data points with a relatively small loss. For a counterexample, see the following model represented in the blue line:

Scattor plot example of overfitting

The function represented in blue technically has a smaller loss than the red model but does a worse job describing the pattern of the dataset. It would almost certainly perform worse on previously unseen data. Using a validation dataset is therefore necessary to evaluate the performance of the model against a dataset not used during training.

Data processing code and walkthrough

Review the code and follow the data processing steps by running this Jupyter notebook on the project GitHub repo in an Amazon SageMaker notebook instance.

Other considerations before starting your Amazon SageMaker training job

There are two other issues to consider before starting the training job: transfer learning and data augmentation

Transfer learning

For deep learning–based computer vision algorithms to perform well, you must have a massive amount of training data. The popular dataset COCO, for example, has more than 200 k labeled images. When you have only a few hundred to a thousand labeled images, the best way to achieve accurate results is through transfer learning. The built-in Amazon SageMaker object detection algorithm makes it trivial to do transfer learning. It initializes the weights of the neural network using parameters from a pretrained model. For more information about enabling it, see the following code section.

Data augmentation

To achieve higher accuracy, it’s also common to use data augmentation in addition to transfer learning. It’s the fastest and cheapest way to multiply the amount of training data you have, by generating new data based on your existing training data. For a detailed introduction to various data augmentation techniques, see Data Augmentation | How to use Deep Learning when you have Limited Data.

For this example, we wrote a simple script that performs some rudimentary augmentation: flipping the image and bounding box labels by x-axis and y-axis, and rotating 90 degrees clockwise and counterclockwise:

example data augmentation

You now have five times more training data after running a simple script. We found that even simple data augmentation like this can make a significant difference in the accuracy. The validation metric used in this project, mAP (mean average precision), saw a 21.5% increase after adding the augmented data to the training dataset. All other hyperparameters remained the same.

This approach of augmenting the data before training is referred to as offline augmentation. Some deep learning frameworks, such as Gluon, also support online augmentation, which applies augmentation as the data is fed to train the model. To take advantage of that, you can build a custom Docker container with your training code and use the bring-your-own-algorithm functionality of Amazon SageMaker.

Submit a training job using the built-in object detection algorithm

Now you are ready to start training jobs. Using the Amazon SageMaker boto3 SDK, you can define train and validation inputs using the following code. The values of the attribute_names parameter must match those in your augmented manifest file:

The JSON attribute name for the S3 URI of the input image
The JSON attribute name for the bounding box annotations

s3_train_data= "s3://{}/{}/training-manifest/train.manifest".format(bucket, prefix)
s3_validation_data = "s3://{}/{}/training-manifest/validation.manifest".format(bucket, prefix)

train_input = {
    "ChannelName": "train",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "AugmentedManifestFile",  
            "S3Uri": s3_train_data,
            "S3DataDistributionType": "FullyReplicated",
            # This must correspond to the JSON field names in your augmented manifest.
            "AttributeNames": ['source-ref', 'bb']
        }
    },
    "ContentType": "application/x-recordio",
    "RecordWrapperType": "RecordIO",
    "CompressionType": "None"
}

validation_input = {
    "ChannelName": "validation",
    "DataSource": {
        "S3DataSource": {
            "S3DataType": "AugmentedManifestFile",  
            "S3Uri": s3_validation_data,
            "S3DataDistributionType": "FullyReplicated",
            #  This must correspond to the JSON field names in your augmented manifest.
            "AttributeNames": ['source-ref', 'bb']
        }
    },
    "ContentType": "application/x-recordio",
    "RecordWrapperType": "RecordIO",
    "CompressionType": "None"
}

s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)

Next, set the hyperparameters. You can find documentation for all the supported hyperparameters in the Amazon SageMaker documentation. There are a few things worth highlighting:

base_network – This is the base feature extractor network as part of the SSD model. Currently the Amazon SageMaker built-in object detection algorithm supports either ResNet-50 or VGG-16. We chose ResNet because it’s more lightweight and thus faster, given that you run the inference at the edge on AWS IoT Greengrass. To train the model with a base network even more lightweight, such as MobileNet and ShuffleNet, you can define a custom algorithm using frameworks such as Gluon, Keras, PyTorch, etc.
use_pretrained_model – This enables/disables transfer learning by initializing the weights of the neural network using parameters from a pre-trained model.
num_classes – This is the number of classes of objects you are trying to predict. In this example, you only have two classes: “blue storage box” or “yellow storage box.” During transfer learning, the original output neural net layer is replaced by a new output layer with number of nodes equal to num_classes.
use_pretrained_mode – This is the height and width of the image being passed into the model. Conveniently, Amazon SageMaker reshapes the image on the fly during training, so you don’t have to resize the images beforehand.
mini_batch_size – This is the number of inputs used for each round of forward and backward pass. Larger batch sizes usually allow the algorithm to converge faster. However, it’s more computationally resource intensive to run each batch. Along with the learning_rate, these are two hyperparameters that you should consider tuning to achieve higher accuracy.
learning_rate – This defines the initial learning rate. Configure it along with lr_scheduler_factor and lr_scheduler_step to gradually reduce the learning rate as your training progresses. Because you are using transfer learning with pretrained parameters, keep the initial learning rate relatively small. Otherwise, the weights get updated in increments that are too large and you obtain unusable results.

hyperparams = { 
            "base_network": 'resnet-50',
            "use_pretrained_model": "1",
            "num_classes": "2",   
            "mini_batch_size": "30",
            "epochs": "30",
            "learning_rate": "0.001",
            "lr_scheduler_step": "10,20",
            "lr_scheduler_factor": "0.25",
            "optimizer": "sgd",
            "momentum": "0.9",
            "weight_decay": "0.0005",
            "overlap_threshold": "0.5",
            "nms_threshold": "0.45",
            "image_shape": "512",
            "label_width": "150",
            "num_training_samples": str(num_training_samples)
 }

Next, for training parameters, specify the following:

The container image for the built-in object detection algorithm
The compute resource type and size (we recommend picking a GPU instance in the P3 or P2 instance family)
The input mode—because you are using the AugmentedManifestFile input file type, you must specify Pipe mode (see Augmented Manifest File)

You can find documentation about the other parameters here.

training_image = sagemaker.amazon.amazon_estimator.get_image_uri(boto3.Session().region_name, 'object-detection', repo_version='latest')

training_params = \
    {
        "AlgorithmSpecification": {
            "TrainingImage": training_image,
            "TrainingInputMode": "Pipe"
        },
        "RoleArn": role,
        "OutputDataConfig": {
            "S3OutputPath": s3_output_path
        },
        "ResourceConfig": {
            "InstanceCount": 1,
            "InstanceType": "ml.p3.8xlarge",
            "VolumeSizeInGB": 200
        },
        "TrainingJobName": model_job_name,
        "HyperParameters": hyperparams,
        "StoppingCondition": {
            "MaxRuntimeInSeconds": 86400
        },
        "InputDataConfig": [
            train_input,
            validation_input
        ]
    }

Finally, kick off the training job with the preceding configurations:

client = boto3.client(service_name='sagemaker')
client.create_training_job(**training_params)

Amazon SageMaker training Jupyter notebook code and walkthrough

Review the full code and follow the training job submission by running this Jupyter notebook on the project GitHub repo in an Amazon SageMaker notebook instance.

Model training tip: Evaluate progress by visualizing trend of validation metric update.

During training, the built-in object detection algorithm reports accuracy metrics on the validation dataset after each training epoch. Find links to the training log and Amazon CloudWatch metrics graph from the Amazon SageMaker console. For example, the mAP (mean average precision) for the training job with the preceding configuration follows:

screenshot of cloudwatch console showing the changes in the mAP metric between training epochs

Visualizing the trend of the mAP validation metric helps you evaluate your hyperparameter choices. For example, if the mAP keeps rising without plateauing, maybe you should train for more epochs. If it shows a significant drop, maybe you should reduce your learning rate.

To improve your hyperparameter choices and achieve better results, look into using the automatic hyperparameter tuning feature of Amazon SageMaker. See the Automatic Model Tuning documentation.

Run local inference using the trained model on an Amazon SageMaker notebook instance

To validate the trained model further and make predictions using the trained model, there are three options:

Deploy it to an endpoint using Amazon SageMaker Hosting Services to get one inference at a time in real time
Use Amazon SageMaker Batch Transform to apply a one-off batch inference job on an entire dataset
Download the model artifacts to an Amazon SageMaker notebook instance and test running inference locally

Because the goal is to eventually run this prediction at the edge, we went with the third option: download the model to an Amazon SageMaker notebook instance and do interference locally. To verify that the model makes inferences as expected, test local inference on an Amazon SageMaker notebook instance before trying it on your edge device. Make sure that the instance has all the MXNet and CUDA dependencies properly configured.

The trained model parameters, along with its network definition, is stored in a tar.gz file in the output path for the training job. Download and unzip it:

MODEL_ARTIFACT = sagemaker_client.describe_training_job(TrainingJobName=JOB_ID)['ModelArtifacts']['S3ModelArtifacts']
!aws s3 cp $MODEL_ARTIFACT .
!tar -xvzf model.tar.gz

Upon unzipping the model, you should find three files in your directory:

model_algo_1-symbol.json   <-- neural network definition 
hyperparams.json           <-- hyper parameters  
model_algo_1-0000.params   <-- trained weights for the neural network

Convert the trained model artifact to a deployable model artifact

The model output produced by the built-in object detection model leaves the loss layer in place and does not include an NMS (non-max suppression) layer. To make it ready for inference on the machine, remove the loss layer and add the NMS layer. Use a script from this GitHub repo.

git clone https://github.com/zhreshold/mxnet-ssd.git

You must run the deploy.py script to convert a trained model to a deployable model. An example follows:

python /home/ec2-user/SageMaker/mxnet-ssd/deploy.py --network resnet50 --num-class 2 --nms .45 --data-shape 512 --prefix model_algo_1

When running this script, make sure that command line options you pass in match exactly the hyperparameters of your training job. If you’re unsure, refer the hyperparams.json file in your unpacked model artifacts to confirm. To run this script successfully, you must use Python2.

With the model artifacts properly converted, you can now load the updated model artifacts into MXNet:

param_path='model_algo_1'
sym, arg_params, aux_params = mx.model.load_checkpoint(param_path, 0)
mod = mx.mod.Module(symbol=sym, label_names=[], context=ctx)
mod.bind(for_training=False, data_shapes=input_shapes)
mod.set_params(arg_params, aux_params)

And then, make some inferences with the test images.

inference result with bounding box overlayed on top of the frame

The local inference Jupyter notebook also includes code that lets you do inference in batch with all images in a folder and generate a PDF that visualizes the inference output for each frame.

6 example images with bounding box inference results

Lastly, remember to save the copy of the deployable model artifact in S3.

aws s3 cp deploy_model_algo_1-0000.params s3://my-bucket/deployable-model/
aws s3 cp deploy_model_algo_1-symbol.json s3://my-bucket/deployable-model/

Local inference Jupyter notebook code and walkthrough

Review the code and follow the preceding local inference steps by running this Jupyter notebook on the project GitHub repo in an Amazon SageMaker notebook instance.

Conclusion

In this post, we shared with you tips for post-processing SageMaker Ground Truth results, data augmentation, training with Amazon SageMaker built-in object detection algorithm, and converting the model artifact for deployment. In the next post, we’ll show you how to take the model and run it on AWS IoT Greengrass core for inference at the edge.

The Internet of Things on AWS – Official Blog