Q: What is Amazon SageMaker Ground Truth?

A: Amazon SageMaker Ground Truth makes it easy for for you to efficiently and accurately label the datasets required for training machine learning systems. SageMaker Ground Truth can automatically label a portion of the dataset based on the labels done manually by human labelers. You can choose to use a crowdsourced Amazon Mechanical Turk workforce of over 500,000 labelers, your own employees , or one of the Amazon-pre-screened third party vendors listed on AWS Marketplace. SageMaker Ground Truth uses innovative algorithms and user experience (UX) techniques to improve the accuracy of human labeling. Over time, the model gets progressively better by continuosly learning from the labels created by humans, for increased automatic labeling.

Q: What is Automated Data Labeling?

A:  Automated data labeling is labeling of data using machine learning. Amazon SageMaker Ground Truth will first select a random sample of data and send it to humans to be labeled. The results are then used to train a labeling model that attempts to label a new sample of raw data automatically. The labels are committed when the model can label the data with a confidence score that meets or exceeds a high threshold. Where the confidence score falls below this threshold, the data is sent to human labelers. Some of the data labeled by humans is used to generate a new training dataset for the labeling model, and the model is automatically retrained to improve its accuracy. This process repeats with each sample of raw data to be labeled. The labeling model becomes more capable of automatically labeling raw data with each iteration, and less data is routed to humans.

Using Amazon SageMaker Ground Truth

Q: Why should I use Amazon SageMaker Ground Truth?

A: Prior to building, training, and deploying machine learning models, you need data. Successful models are built on high-quality training data, and collecting and labeling the training datasets involves a lot of time and effort. To build the training datasets, human labelers need to evaluate a large number of images or other data types, and then identify and label particular objects in each data type. These labeling tasks are distributed across many human labelers, adding significant overhead and cost. If there are incorrect labels, the system will learn from the bad information and make inaccurate predictions.

Amazon SageMaker Ground Truth solves this problem by making it easy to efficiently perform highly accurate data labeling using data stored in Amazon S3, using a combination of automated data labeling and human-performed labeling.

Q: How do I get started with Amazon SageMaker Ground Truth?

A: Amazon SageMaker Ground Truth provides a managed experience where you can set up an entire data labeling job with just a few steps. To get started with Amazon SageMaker Ground Truth, you sign in to the AWS Management Console and navigate to the SageMaker console. From there, select Labeling jobs under Ground Truth. Here you can create a labeling job. First as part of the labeling job creation flow, you provide a pointer to the S3 bucket that contains your dataset to be labeled. Ground Truth offers templates for common labeling tasks where you only need to click a few choices and provide minimal instructions on how to get their data labeled. Alternatively, you can create your own custom template. As the last step of creating a labeling job, you select one of the three human workforce options: (1) a public crowdsourced workforce, (2) a curated set of vendors who provide data labeling services, and (3) bring your own workers. You also have the option to enable automated data labeling.

Q:  How are my training datasets managed using Amazon SageMaker Ground Truth?

A: Amazon SageMaker Ground Truth manages the metadata, associated labels, and a taxonomy of your labels and datasets. You can easily use the AWS SDK through a SageMaker Notebook or the Ground Truth console within the SageMaker console to query and manage your datasets and labels. Visit the Amazon SageMaker Ground Truth documentation for more information.

Q:  How does Amazon SageMaker Ground Truth help with increasing the accuracy of my training datasets?

A: Amazon SageMaker Ground Truth offers the following features to help customers increase the accuracy of data labeling performed by humans:

(a) Annotation consolidation: This counteracts the error/bias of individual workers by sending each data object to multiple workers and then consolidating their responses (called “annotations”) into a single label. It then takes their annotations and compares them using an annotation consolidation algorithm. This algorithm first detects outlier annotations that are disregarded. It then performs a weighted consolidation of the annotations, assigning higher weights to more reliable annotations. The output is a single label for each object.

(b) Annotation interface best practices: These are features of the annotation interfaces that enable workers to perform their tasks more accurately. Human workers are prone to error and bias, and well-designed interfaces improve worker accuracy. One best practice is to display brief instructions along with good and bad label examples in a fixed side panel. Another best practice is to darken the area outside of the box bounding boundary when workers are drawing the bounding box on an image.

Q:  How does Amazon SageMaker Ground Truth ensure that my data is protected and secure?

A: By default, Amazon SageMaker Ground Truth encrypts your data at rest and in transit. In addition, access to your data can be controlled using AWS Identity and Access Management (IAM). Ground Truth does not store or make copies of your data outside of your AWS environment, and your data remains in your control. Further, Ground Truth supports compliance standards such as General Data Protection Regulation (GDPR), and provides comprehensive logging and auditing capabilities using Amazon CloudWatch and Amazon CloudTrail. Visit the Amazon SageMaker Ground Truth documentation for more information.  

Q:   How do I access a human workforce using Amazon SageMaker Ground Truth?

A:  From SageMaker Ground Truth, you can choose any of the three workforce options namely (1) Public crowdsourced workforce through Amazon Mechanical Turk; (2) Third party vendors available through AWS Marketplace; and (3) Your own employees. Visit the Amazon SageMaker Ground Truth documentation for more information.  

Pricing and Availability

Q: How much does Amazon SageMaker Ground Truth cost?

A: Please see the SageMaker Ground Truth pricing page for current pricing information.

Q: In which AWS regions is Amazon SageMaker Ground Truth available?

A: Amazon SageMaker Ground Truth is currently available in N. Virginia, Ohio, Oregon, Ireland, and Tokyo AWS regions.

Learn more about Amazon SageMaker Ground Truth Pricing

Get started with Amazon SageMaker Ground Truth with no upfront commitments or long-term contracts. For more details, check out the Amazon SageMaker Ground Truth pricing page.

Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console

Get started building with Amazon SageMaker Ground Truth in the AWS Management Console.

Sign in