Object detection and model retraining with Amazon SageMaker and Amazon Augmented AI
Industries like healthcare, media, and social media platforms use image analysis workflows to identify objects and entities within pictures to understand the whole image. For example, an ecommerce website might use objects present in an image to surface relevant search results. Sometimes image analysis may be difficult when images are blurry or more nuanced. In these cases, you may need a human to complete the machine learning (ML) loop and advise on the image using their human judgment.
In this post, we use Amazon SageMaker to build, train, and deploy an ML model for object detection and use Amazon Augmented AI (Amazon A2I) to build and render a custom worker template that allows reviewers to identify or review objects found in an image. Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy ML models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. Amazon A2I is a fully managed service that helps you build human review workflows to review and validate the ML models’ predictions.
You can also use Amazon Rekognition for object detection to identify objects from a predefined set of classes, or use Amazon Rekogition Custom Labels to train your custom model to detect objects and scenes in images that are specific to your business needs, simply by bringing your own data.
Some other common use cases that may require human workflows are content moderation in images and video, extracting text and entities from documents, translation, and sentiment analysis. Although you can use ML models to identify inappropriate content or extract entities, humans may need to validate the model predictions based on the use case. Amazon A2I helps you quickly create these human workflows.
You can also use Amazon A2I to send a random sample of ML predictions to human reviewers. You can use these results to inform stakeholders about the model’s performance and to audit model predictions.
This post requires you to have the following prerequisites:
- An IAM role – To create a human review workflow, you need to provide an AWS Identity and Access Management (IAM) role that grants Amazon A2I permission to access Amazon Simple Storage Service (Amazon S3) for reading objects to render in a human task UI and for writing the results. This role also needs an attached trust policy to give Amazon SageMaker permission to assume the role. This allows Amazon A2I to perform actions in accordance with permissions that you attach to the role. For more information and example policies, see Add Permissions to the IAM Role Used to Create a Flow Definition.
- Accompanying object detection training notebook – This post provides an accompanying notebook for this walkthrough. For this post, we focus on using Amazon A2I and the importance of bringing human reviewers in the loop. Therefore, we take a trained object detection model from an S3 bucket and host it on an Amazon SageMaker hosted endpoint for real-time prediction. For more information about training an object detection model using the Amazon SageMaker built-in Single Shot multibox Detector (SSD) algorithm with PASCAL VOC dataset and hosting it for real-time prediction, see the GitHub repo. If you’re interested in building your own model, follow the object detection training notebook. If you have a large dataset without Amazon SageMaker Ground Truth labels, you can use Ground Truth to efficiently label your images at scale.
To implement this solution, you complete the following steps:
- Host an object detection model on Amazon SageMaker.
- Create a worker task template.
- Create a private work team.
- Create a human review workflow.
- Call the Amazon SageMaker endpoint.
- Complete the human review.
- Process the JSON output for incremental training.
For this post, we ran the walkthrough in
us-east-1, but Amazon A2I is available in many Regions. For more information, see Region Table.
Step 1: Host an object detection model on Amazon SageMaker
This step is available in the accompanying Jupyter notebook. To set up your endpoint, enter the following Python code (this may take a few minutes):
When the endpoint is up and running, you should see the
InService status on the Amazon SageMaker console. (Note that the URL takes you to the console in us-east-1 which is where we did the demo, but Amazon A2I is available in many more Regions. Be sure to switch to your region.)
To see what object detection looks like, enter the following code. The predicted class and the prediction probability is visualized, along with the bounding box using the helper functions
load_and_predict defined in the accompanying Jupyter notebook.
The following screenshot shows the output of an image with a label and bounding box.
We under-trained this SSD model for demonstration purposes in the object detection training notebook. Although the model identifies a bicycle in the image, a probability of 0.245 is considered low to be a trustworthy prediction in modern computer vision. Furthermore, the localization of the object isn’t quite accurate; the bounding box doesn’t cover the front wheel and the saddle. However, this under-trained model serves as a perfect example of bringing human reviewers when a model doesn’t make a prediction with high confidence.
Step 2: Create a worker task template
You can use Amazon A2I to incorporate a human review into any ML workflow. In this post, to integrate Amazon A2I with the Amazon SageMaker hosted endpoint, you need to create a custom task. When you use a custom task type, you create and start a human loop using the Amazon A2I Runtime API to send the data that requires review using a worker task template. For more information, see Use Amazon Augmented AI with Custom Task Types.
Crowd HTML elements are web components that provide several task widgets and design elements that you can tailor to the question you want to ask. You can use these crowd elements to create a custom worker template and integrate it with an Amazon A2I human review workflow to customize the worker console and instructions. We provide over 60 sample custom task templates in the GitHub repo that you can use as is or as a starting point to customize your own templates. For an object detection use case, the reviewer typically needs to select labels and draw bounding boxes. For this post, you use one of the sample task templates, bounding-box.liquid.html, from the repository and make some customizations. This template includes labeling instructions, labeling functionality (draw, zoom in and out, and label search) and reads an image from a given Amazon S3 path. You may also customize the template to display the bounding boxes with an
initial-value so that the workers can start with a bounding box predicted by the ML model instead of drawing the bounding box from scratch.
This step is available in the accompanying Jupyter notebook. To create a custom worker template on the Amazon A2I console, complete the following steps:
- Navigate to Worker task templates.
- Choose Create template.
- For Template name, enter a name that is unique within the Region in your account; for example,
- For Template type, choose Custom.
- In the Template editor, enter the sample task HTML templates from bounding-box.liquid.html.
- Modify the
labelsvariable in the editor according to the classes included in the PASCAL VOC dataset and object detection model:
['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat','chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person','pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
- Modify the
- Choose Create.
Step 3: Create a private work team
You can easily route reviews to your private workforce with Amazon A2I. You can also access a workforce of over 500,000 independent contractors who are already performing ML-related tasks through Amazon Mechanical Turk. Alternatively, if your data requires confidentiality or special skills, you can use workforce vendors who are experienced with review projects and prescreened by AWS for quality and security procedures.
Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this post, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.
You create and manage your private workforce in the Labeling workforces page on the Amazon SageMaker console. When following the instructions, you can create a private workforce by entering worker emails or importing a pre-existing workforce from an Amazon Cognito user pool.
If you already have a work team created for Amazon SageMaker Ground Truth, you can use the same work team with Amazon A2I and skip to the following section.
This step is not available in the accompanying Jupyter notebook.
To create your private work team, complete the following steps:
- On the Amazon SageMaker console, navigate to the Labeling workforces
- On the Private tab, choose Create private team.
- Choose Invite new workers by email.
- For this post, enter your email address to work on your document-processing tasks.
You can enter a list of up to 50 email addresses, separated by commas, into the Email addresses box.
- Enter an organization name and contact email.
- Choose Create private team.
After you create the private team, you get an email invitation. The following screenshot shows an example email.
After you click the link and change your password, you are registered as a verified worker for this team. The following screenshot shows the updated information on the Private tab.
Your one-person team is now ready, and you can create a human review workflow.
Replace ‘YOUR_WORKTEAM_ARN‘ in the accompanying Jupyter notebook with the ARN of the work team you created:
Step 4: Create a human review workflow
A human review workflow is also referred to as a flow definition. You use the flow definition to configure your human work team and provide information about how to accomplish the review task. You can use a flow definition to create multiple human loops.
This step is available in the accompanying Jupyter notebook. To do so on the Amazon A2I console, complete the following steps:
- Navigate to the Human review workflows
- Choose Create human review workflow.
- In the Workflow settings section, for Name, enter a unique workflow name; for example,
- For S3 bucket, enter the S3 bucket where you want to store the human review results.
The bucket must be located in the same Region as the workflow. For example, if you create a bucket called
a2i-demos, enter the path
- For IAM role, choose Create a new role from the drop-down menu.
Amazon A2I can create a role automatically for you.
- For S3 buckets you specify, select Specific S3 buckets.
- Enter the S3 bucket you specified earlier; for example, a2i-demos.
- Choose Create.
You see a confirmation when role creation is complete, and your role is now pre-populated in the IAM role drop-down menu.
- For Task type, select Custom.
In the next steps, you select the UI template you created earlier.
- In the Worker task template section, select Use your own template.
- For Template, choose the template you created.
- For Task description, enter a short description of the task.
- In the Workers section, select Private.
- For Private teams, choose the work team you created earlier.
- Choose Create.
You are redirected to the Human review workflows page and see a confirmation message similar to the following screenshot.
Record your new human review workflow ARN, which you use to configure your human loop in the next section.
Step 5: Call the Amazon SageMaker endpoint
Now that you have set up your Amazon A2I human review workflow, you’re ready to call your object detection endpoint on Amazon SageMaker and start your human loops. For this use case, you only want to start a human loop if the highest prediction probability score returned by your model for objects detected is less than 50% (
SCORE_THREHOLD). With a bit of logic (see the following code), you can check the response for each call to the Amazon SageMaker endpoint using the
load_and_predict helper function, and if the highest prediction probability score is less than 50%, you create a human loop.
You use a human loop to start your human review workflow. When a human loop is triggered, human review tasks are sent to the workers as specified in the flow definition.
This step is available in the accompanying Jupyter notebook.
The preceding code uses a simple if-else statement, but for dynamic conditions, you can also use AWS Lambda to evaluate if an object needs a human review. When you decide that a human review is needed, you can create a human loop using
Step 6: Completing a human review
After you send the images with low prediction probability to Amazon A2I via the
start_human_loop call, you or the person assigned as the reviewer can log in to the labeling portal to review the images. You can find the URL on the Amazon SageMaker console, on the Private tab of the Labeling workforce page. You can also programmatically retrieve the URL with the following code:
For this post, workteamName is
To complete a human review, complete the following steps:
- When you navigate to the portal, you are prompted to log in with your username and password (if this is the first time for you to visit the portal, you need to create a new password).
You can see a new job for you in the Jobs section.
- Choose Object detection a2i demo.
- Choose Start working.
The page contains a customizable instruction panel, the image, and available labels.
- From the toolbar, choose Box.
- Under Labels, choose bicycle.
- Draw your bounding box around the object.
- Choose Submit.
After you complete all the image reviews, you can analyze the output of the human loop. Amazon A2I stores the results in your S3 bucket and sends you an Amazon CloudWatch event.
Your results should be available in the Amazon S3 output path specified in the human review workflow definition when all work is completed. The human answer, label, and bounding box are returned and saved in the JSON file. The following code shows a sample Amazon A2I output JSON file:
You can retrieve this information and parse it for further analyses. In the next step, we show you how to use this human-reviewed data for the next retraining iteration of your model.
Step 7: Processing the JSON output for incremental training
In the object detection training notebook, you used the Amazon SageMaker built-in object detection algorithm to train the first version of the model. You used the model to generate predictions on some random out-of-sample images and got an unsatisfactory prediction (low probability). You also used Amazon A2I to review and label the image based on your custom criteria (<50% confidence score threshold). The next step in a typical ML lifecycle is to include the cases with which the model had trouble in the next batch of training data for retraining purposes. This way, the model can learn from a set of new training data for continuous improvement. In ML, this is called incremental training.
This step is available in the accompanying Jupyter notebook.
You can supply the image data and annotation to the object detection algorithm in three different ways. For more information, see Input/Output Interface for the Object Detection Algorithm. For this post, we trained our original model with the RecordIO format because we converted the PASCAL VOC images and annotations into RecordIO format. For instructions on creating a custom RecordIO data, see Prepare custom datasets for object detection.
Alternatively, the object detection algorithm also takes a JSON file as input. You could create one JSON file per image or take an advantage of Pipe mode by using an augmented manifest file as the input format. Pipe mode accelerates overall model training time by up to 35% by streaming the data into the training algorithm while it’s running instead of copying data to the Amazon Elastic Block Store (Amazon EBS) volume attached to the training instance. You can construct an augmented manifest file from the Amazon A2I output with the following code:
This results in a JSON object like the following code, which is compatible with how Ground Truth outputs the result and how the SageMaker built-in object detection algorithm expects the input:
The preceding code is only one image. To create a cohort of training images from all the images re-labeled by human reviewers on the Amazon A2I console, you can loop through all the Amazon A2I output, convert the JSON file, and concatenate them into a JSON Lines file, with each line represents results of one image. See the following code:
After you collect enough data points, you can construct a new
Estimator for incremental training. For more information, see Easily train models using datasets labeled by Amazon SageMaker Ground Truth. In this blog we use the hyperparameters exactly the same as how the first model was trained in the object detection training notebook, with the exception of using the weights from the trained model instead of pretrained weights that comes with the algorithm (
The following code example demonstrates incremental training with one or two new samples. Because we only reviewed two images in this post, this doesn’t yield a model with meaningful improvement.
After training, you get a new model in the
s3_output_location. You can then deploy this model to a new inference endpoint or update an existing endpoint. There is no availability loss when you update an existing endpoint. To update an endpoint, you need to provide a new endpoint configuration. For more information, see UpdateEndpoint.
To avoid incurring future charges, delete resources such as the Amazon SageMaker endpoint, notebook instance, and the model artifacts in Amazon S3 when not in use.
This post has merely scratched the surface of what Amazon A2I can do in a typical ML lifecycle. We demonstrated how to set up everything you need to have a working human in the loop framework: an Amazon A2I worker task template interface, a human review workflow, and a work team. We also showed how to trigger an Amazon A2I human loop programmatically after an Amazon SageMaker hosted object detection endpoint returns a low confidence inference. Lastly, we walked you through how to work with the Amazon A2I output JSON file to create a new batch of training data in an augmented manifest format for incremental training using the Amazon SageMaker built-in object detection algorithm.
For video presentations, sample Jupyter notebooks, or more information about use cases like document processing, content moderation, sentiment analysis, text translation, and others, see Amazon Augmented AI Resources.
- Everingham, Mark, et al. “The pascal visual object classes challenge: A retrospective.” International journal of computer vision 111.1 (2015): 98-136.
- Liu, Wei, et al. “SSD: Single shot multibox detector.” European conference on computer vision. Springer, Cham, 2016.
About the authors
Michael Hsieh is the Senior AI/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of AWS ML offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great mother nature the city has to offer such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.
Anuj Gupta is Senior Product Manager for Amazon Augmented AI. He focuses on delivering products that make it easier for customers to adopt machine learning. In his spare time, he enjoys road trips and watching Formula 1.