Amazon SageMaker Data Labeling
Create high-quality datasets for training machine learning models
Receive high-quality labeled data quickly
Choose your data labeling workforce
Increase visibility of data labeling operations
Generate high quality datasets to customize generative AI models
Amazon SageMaker enables you to label raw data, such as images, text files, and videos, and generate labeled synthetic data to create high-quality datasets for training machine learning (ML) models. SageMaker offers two options, Amazon SageMaker Ground Truth Plus and Amazon SageMaker Ground Truth, which provide you with the flexibility to use an expert workforce to create and manage data labeling workflows on your behalf or manage your own data labeling workflows.
Amazon SageMaker Ground Truth Plus
SageMaker Ground Truth Plus is a fully-managed service that allows you to create high-quality training datasets without having to build labeling applications or manage labeling workforces on your own. SageMaker Ground Truth Plus provides an expert workforce that is trained on ML tasks and can help meet your data security, privacy, and compliance requirements, while helping you reduce data labeling costs by up to 40%. You upload your data, and then SageMaker Ground Truth Plus creates and manages data labeling workflows and the workforce on your behalf.
SageMaker Ground Truth Plus can create high quality datasets to fine-tune foundation models for generative AI tasks, from answering questions to generating images and videos. It also allows skilled human workforces to review model outputs to ensure that they are aligned with human preferences. Additionally, SageMaker Ground Truth Plus enables application builders to customize models using their industry or company data to ensure their application represents their preferred voice and style.
Amazon SageMaker Ground Truth
If you want the flexibility to build and manage your own data labeling workflows and workforce, you can use SageMaker Ground Truth. SageMaker Ground Truth is a self-service offering that makes it easy to label data and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce.
You can also generate labeled synthetic data without manually collecting or labeling real-world data. SageMaker Ground Truth can generate hundreds of thousands of automatically labeled synthetic images on your behalf.
How it works
-
Label data with SageMaker Ground Truth Plus
-
Label data with SageMaker Ground Truth
-
Generate labeled synthetic data
-
Label data with SageMaker Ground Truth Plus
-
Amazon SageMaker Ground Truth Plus helps you to create high-quality training datasets without having to build labeling applications or manage a labeling workforce.
-
Label data with SageMaker Ground Truth
-
Amazon SageMaker Ground Truth helps you build and manage your own data labeling workflows and data labeling workforce.
-
Generate labeled synthetic data
-
Amazon SageMaker Ground Truth helps you generate labeled synthetic data.
Use cases
Support for Generative AI Applications
Create high quality datasets to fine-tune and customize foundation models.
Natural Language Processing
Classify text or label Named Entities (NER) with specific labels to generate your training dataset.
Computer Vision
Classify images and videos, perform semantic segmentation for highly detailed object recognition, and detect and track objects with a full suite of image and video annotation tools.
3D LIDAR Navigation
Detect and track objects, and perform semantic segmentation for highly detailed object recognition within LIDAR 3D point cloud data.
Customers
How to get started
Get started with data labeling
Set up your own labeling workflow with SageMaker Ground Truth.
Learn more about SageMaker Ground Truth
Access additional resources, documentation and learning materials.