AWS Machine Learning Blog

Object Detection algorithm now available in Amazon SageMaker

Amazon SageMaker is a fully-managed and highly scalable machine learning (ML) platform that makes it easy build, train, and deploy machine learning models. This is a giant step towards the democratization of ML and in lowering the bar for entry in to the ML space for developers. Computer vision is the branch of machine learning that deals with images. The Amazon SageMaker image classification algorithm, which learns to categorize images into a set of pre-defined categories is, one of the more popular algorithms offered by SageMaker.

Today, we are enhancing our computer vision offerings with the launch of Amazon SageMaker object detection (OD) algorithm. Object detection is the process of identifying and localizing objects in an image. This algorithm takes image classification further by proving a bounding box on the image where the object is along with identifying what object the box encapsulates.  Note that you can also use the Amazon Rekognition service for object detection, if you do not need to train with your own dataset containing custom classes. Amazon Rekognition provides APIs that can identify objects from a pre-defined set of classes. With the Amazon SageMaker object detection algorithm, not only can you train with your own dataset/classes, you can also localize the objects in the image.

An example of the Amazon SageMaker object detection algorithm at work. Photo by Mansoor via PEXELS.

The Single-Shot Multi-Box Detector (SSD) is one of the faster and more accurate algorithms to accomplish this task. It detects multiple objects in the image, with just one pass during inference. We are excited to announce that SSD is now available as a built-in algorithm for Amazon SageMaker customers in all of the Regions where SageMaker is available. The algorithm can be trained using P2/P3 instances in the following configurations:

  • Single machine, single GPU
  • Single machine, multi-GPUs
  • Multi-machine, multi-GPUs

The algorithm can be hosted on all CPU and GPU instances supported by Amazon SageMaker. When trained and properly hosted, the algorithm will generate bounding boxes for object instances in a query image along with a score for each object category as shown in the image at the beginning of this blog post. The algorithm can handle a variety of objects of varying sizes and scales natively.

The SSD algorithm uses convolutional feature maps. Using feature maps at different scales, SSD generates multiple boxes of potential object candidates. A non-maximal suppression is then performed to remove overlapping boxes. The ones with the highest scores are declared as object predictions. Since this algorithm is compatible with various convolutional neural networks, many popular network architectures can be used. Our offering of this algorithm comes with a choice of two base networks: VGG16 and ResNet50, each with models pre-trained on the ImageNet classification task that can be transferred to customers’ object detection datasets as well.

Getting started

Amazon SageMaker object detection expects the customer’s training dataset to be on Amazon Simple Storage Service (Amazon S3). Once trained, it also produces the resulting model artifacts at Amazon S3. Amazon SageMaker takes care of starting and stopping Amazon Elastic Compute Cloud (Amazon EC2) instances for the customers during training. After the model is trained, it can be deployed to an endpoint. For a general, high-level overview of the Amazon SageMaker workflow, see the Amazon SageMaker documentation.

To begin with, the inputs of the algorithm should either be in MXNet RecordIO format or raw images with JSON  annotations so that you can setup the training configuration. We recommend the RecordIO format, especially when there are a lot of images. We make this recommendation for two reasons. It’s usually slower to download many small files from Amazon S3 rather than downloading one larger file. Also, the algorithm natively uses the RecordIO format, so when you running multiple iterations of the algorithm, having the data in RecordIO format saves valuable conversion time. Note that the object detection algorithm supports only GPU instances for training, and we recommend using GPU instances with more memory for training with large batch sizes.  While the algorithm trains, you can monitor the progress through either a SageMaker notebook or Amazon CloudWatch. After the training is done, the trained model artifacts will be uploaded to an Amazon S3 output location that you specified in the training configuration. To deploy the model as an endpoint, you can choose to use either a CPU or a GPU instance.

Performance numbers

The following table shows some performance numbers for the Amazon SageMaker object detection algorithm. We trained on PASCAL VOC07 and VOC12 datasets and show the mean average precision on the VOC07 dataset with an image size of 512 X 512 (image_shape = 512). For experiments in the following table, we used sgd as the optimizer with default parameters (momentum = 0.9, weight_decay = 0.0005). For all the four settings, we trained for 240 epochs, and reduced the learning rate by a factor 0.1 at epoch 80 and epoch 160. This learning strategy can be implemented by using the following hyperparameters epochs =240,  lr_scheduler_step = “80, 160”, and lr_scheduler_factor = 0.1. As you can see with limited parameter tuning, we match the state-of-the-art performance numbers as reported here.

Network Instance Pre-Trained Model Used Batch Size Learning Rate Mean Average Precision
VGG-16 ml.p3.8xlarge Yes 64 0.004 0.780
VGG-16 ml.p3.2xlarge Yes 16 0.0005 0.780
ResNet-50 ml.p3.8xlarge Yes 64 0.002 0.791
ResNet-50 ml.p3.2xlarge Yes 16 0.0005 0.796

Using base_network = “resnet-50”, we observe an approximately linear-speedup while going from single GPU to 16 GPUs in P2 instances and to 8 GPUs in P3 instances. This can be clearly noticed in the plot below. Analogously, we also observed  greater than 6x speed increase when moving from P2 to P3 single GPU instances.

We also observe speedups when scaling across multiple machines. For example, when using ml.p2.8xlarge instances, we see 1.8x and 3.6x speedups when moving from 1 machine to 2 machines and to 4 machines, respectively.


An example of object detection is available in notebook format. Refer to this for a complete tutorial and some recommendations for data preparation and hyperparameters.


In this blog post we showed you how to get started using the new Amazon SageMaker object detection algorithm. We’ve also shown you some example performance numbers. We look forward to hearing from you as you set up your own implementation of object detection.


About the Authors

Xiong Zhou is an Applied Scientist with AWS Deep Learning. He has a PhD in Electrical and Electronics Engineering from University of Houston. His current research focus involves developing domain adaptation and active learning algorithms. He is also working on building computer vision algorithms for Amazon SageMaker.



Ragav Venkatesan is a Research Scientist with AWS Deep Learning. He has an MS in Electrical Engineering and a PhD in Computer Science from Arizona State University. His current area of research includes Neural Network compression and Computer Vision algorithms.



Vineet Khare is a Sciences Manager for AWS Deep Learning. He focuses on building Artificial Intelligence and Machine Learning applications for AWS customers using techniques that are at the forefront of research. In his spare time, he enjoys reading, hiking and spending time with his family.




Saksham Saini is a Software Developer with Amazon Sagemaker. He did his BS in Computer Engineering from UIUC. He is currently working on building highly optimized and scalable algorithms for Amazon Sagemaker. Outside work, he enjoys reading, memes, playing music and traveling.