Amazon SageMaker Data Labeling
Create high-quality datasets for training machine learning models
Choose your data labeling workforce
Increase visibility of data labeling operations
Receive high-quality labeled data quickly
Amazon SageMaker enables you to identify raw data, such as images, text files, and videos; add informative labels; and generate labeled synthetic data to create high-quality training datasets for your machine learning (ML) models. SageMaker offers two options, Amazon SageMaker Ground Truth Plus and Amazon SageMaker Ground Truth, which provide you with the flexibility to use an expert workforce to create and manage data labeling workflows on your behalf or manage your own data labeling workflows.
Amazon SageMaker Ground Truth Plus
With SageMaker Ground Truth Plus, you can create high-quality training datasets without having to build labeling applications or manage labeling workforces on your own. SageMaker Ground Truth Plus helps reduce data labeling costs by up to 40%. SageMaker Ground Truth Plus provides an expert workforce that is trained on ML tasks and can help meet your data security, privacy, and compliance requirements. You upload your data, and then SageMaker Ground Truth Plus creates and manages data labeling workflows and the workforce on your behalf.Amazon SageMaker Ground Truth
If you want the flexibility to build and manage your own data labeling workflows and workforce, you can use SageMaker Ground Truth. SageMaker Ground Truth is a data labeling service that makes it easy to label data and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce.
You can also generate labeled synthetic data without manually collecting or labeling real-world data. SageMaker Ground Truth can generate hundreds of thousands of automatically labeled synthetic images on your behalf.
How it works
-
Label data with SageMaker Ground Truth Plus
-
Label data with SageMaker Ground Truth
-
Generate labeled synthetic data
-
Feature comparison
-
Label data with SageMaker Ground Truth Plus
-
Amazon SageMaker Ground Truth Plus helps you to create high-quality training datasets without having to build labeling applications or manage a labeling workforce.
-
Label data with SageMaker Ground Truth
-
Amazon SageMaker Ground Truth helps you build and manage your own data labeling workflows and data labeling workforce.
-
Generate labeled synthetic data
-
Amazon SageMaker Ground Truth helps you generate labeled synthetic data.
-
Feature comparison
-
Category Amazon SageMaker Ground Truth Amazon SageMaker Ground Truth Plus Data labeling workflows Custom or 30-plus built-in workflows for text, images, video, and 3D point clouds. You manage your data labeling workflows and data labeling quality Custom or 30-plus built-in workflows for text, images, video, and 3D point clouds. AWS manages your data labeling workflows and data labeling quality User Data Scientists and ML Engineers Data Scientists, ML Engineers, Data Operations Managers, and Program Managers Workforce Your choice: third-party vendors, Amazon Mechanical Turk, or your own private workforce Expert workforce that can help meet your data security, privacy, and compliance requirements ML labeling techniques Active learning Active learning, pre-labeling, and machine validation Synthetic data generation Supported Supported
Benefits
Improve quality of training datasets
Amazon SageMaker data labeling offerings provide ML labeling techniques that are less prone to manual errors, including synthetic data generation, and help improve the quality of training datasets. Amazon SageMaker Ground Truth Plus has a multi-step labeling workflow that includes ML models for pre-labeling, machine validation of human labeling to detect errors and low-quality labels, and assistive labeling features (e.g., 3D cuboid snapping, predict-next in video labeling, and auto-segment tools). If you are managing your own data labeling workflows, SageMaker Ground Truth provides automated labeling features such as auto-segment, automatic 3D cuboid snapping, and sensor fusion with 2D video frames. In addition, SageMaker Ground Truth provides automatic data labeling which uses active learning and only routes labeled data to humans if the model cannot confidently label it.
Choose your data labeling workforce
With Amazon SageMaker data labeling offerings, you have two options to label data. First, with SageMaker Ground Truth Plus, an expert workforce that is trained on ML tasks labels your data in accordance with your quality and turnaround time requirements. Second, with SageMaker Ground Truth, you can build and manage your data labeling workflows. You have options to work with labelers inside and outside your organization. For example, you can send labeling jobs to your own labelers, or you can access a workforce of over 500,000 independent contractors who are already performing ML-related tasks through Amazon Mechanical Turk. If your data requires confidentiality or special skills, you can also use vendors that are pre-screened by AWS for quality and security procedures. If you need access to synthetic data so your training datasets are more complete for training ML models, AWS digital artists use customer-provided assets and images to generate synthetic data that is automatically labeled on your behalf.
Increase visibility of data labeling operations
Amazon SageMaker data labeling offerings enable you to gain transparency into data labeling operations and quality management so you can verify that your quality requirements are being met. SageMaker Ground Truth Plus provides interactive dashboards and user interfaces, so you can monitor progress of training datasets across multiple projects, track project metrics such as daily throughput, inspect labels for quality, and provide feedback on the labeled data.
Receive high-quality labeled data quickly
With Amazon SageMaker data labeling offerings, you can receive high-quality labeled data quickly. With SageMaker Ground Truth Plus, you upload your data in Amazon S3 along with security, privacy, and compliance requirements. AWS experts will then set up the data labeling workflow and an expert workforce will complete your labeling tasks. If you need access to synthetic data, you specify your image requirements or provide 3D assets and baseline images, and SageMaker Ground Truth can generate highly accurate labeled synthetic data for ML model training.

Get started building with Amazon SageMaker Data Labeling in the AWS Management Console.