In this tutorial, learn how to set up a labeling job in Amazon SageMaker Ground Truth to annotate training data for your machine learning (ML) model.
A labeled dataset is critical to supervised training of an ML model. Many organizations have huge datasets, but lack labels associated with the data. Using Amazon SageMaker Ground Truth, you can easily label data with the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce.
For this tutorial, you use SageMaker Ground Truth to label a set of images of vehicles, including airplanes, cars, ferries, helicopters, and motorbikes. Because this tutorial uses a non-sensitive dataset, you use the Amazon Mechanical Turk option.
What you will accomplish
In this guide, you will:
- Create and configure a data labeling job
- Review the results of the labeling job
Before starting this guide, you will need:
- An AWS account: If you don't already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
In the AWS console search bar, enter SageMaker, and then choose Amazon SageMaker to open the SageMaker console.
In the left navigation pane, click “Notebook”. Choose Notebook instances, and then choose Create notebook instance.
On the Create notebook instance page, under Notebook instance settings, for Notebook instance name, enter SageMaker-Ground-Truth-Tutorial. For Notebook instance type, select ml.t2.medium.
In the Permissions and encryption section, for IAM role, choose Create a new role. In the Create an IAM role dialog box, select Any S3 bucket and choose Create role. As a best practice, limit S3 bucket access to a specific IAM role with the minimum required permissions in production environments. Note this role name for clean up at the end.
SageMaker creates the AmazonSageMaker-ExecutionRole-<role-id> role. Keep the default settings for the remaining settings and choose Create notebook instance.
In the Notebook instances section, the newly created SageMaker-Ground-Truth-Tutorial notebook instance is displayed with a status of Pending. The notebook is ready when the status changes to InService.
Congratulations! You have finished the Label Training Data for Machine Learning tutorial.
In this tutorial, you used Amazon SageMaker Ground Truth and Amazon Mechanical Turk to build a training dataset for machine learning.
You can continue your machine learning journey with Amazon SageMaker by following the next steps section below.