Offerings

Category Amazon SageMaker Ground Truth Amazon SageMaker Ground Truth Plus
Data labeling workflows Custom or 30-plus built-in workflows for text, images, video, and 3D point clouds. You manage your data labeling workflows and data labeling quality Custom or 30-plus built-in workflows for text, images, video, and 3D point clouds. AWS manages your data labeling workflows and data labeling quality
User Data Scientists and ML Engineers Data Scientists, ML Engineers, Data Operations Managers, and Program Managers
Workforce Your choice: third-party vendors, Amazon Mechanical Turk, or your own private workforce Expert workforce that can help meet your data security, privacy, and compliance requirements 
ML labeling techniques  Active learning Active learning, pre-labeling, and machine validation 
Synthetic data generation Supported Supported

Generative AI

Generate high quality datasets to fine-tune your own foundation models on specific tasks

Amazon SageMaker Ground Truth Plus provides labeling interfaces, labeling workflows, and skilled data annotators who create high quality datasets needed to customize foundation models. Data annotators can complete a variety of tasks, such as writing question and answer pairs, generating text, summarizing text, reworking text, and providing captions for images and videos so that the model can learn from these examples.

  • Question and answer: With question and answer pairs, you can prepare demonstration datasets that teach your large language model on how to answer questions.
Amazon SageMaker Ground Truth Plus question and answer
  • Image captioning: With image captioning, you can prepare datasets that describe the scene and objects in an image in rich detail in order to train text-to-image models so they create accurate and creative images aligned with your intent. It can also be used to train image-to-text models to give an accurate description of the image scene.
Amazon SageMaker Ground Truth Plus image captioning
  • Video captioning: With video captioning, you can prepare datasets that describe actions and the scene of a video in rich detail in order to train text-to-video models to create accurate and creative videos aligned with human your intent. It can also be used to train video-to-text models to give an accurate description of the video.
Video caption: “Amazon SageMaker Ground Truth Plus video captioning”

Align large language models (LLMs) with human preferences using high quality human feedback

Human feedback is essential to ensure that LLMs generate content that is aligned with human preferences – that is, the content is helpful, accurate, and harmless for users to accomplish their tasks. Amazon SageMaker Ground Truth Plus enables data annotators to review, rank, and classify model outputs and use that data to train models to reduce toxic, error-filled, or irrelevant content. For example, annotators may rank several responses that were generated by a chatbot and label them based on factual accuracy, relevance, and writing clarity.

Amazon SageMaker Ground Truth object detection

Customize existing foundation models with company or industry specific data

Amazon SageMaker Ground Truth Plus allows customers to use company-specific data (such as documentation and messaging) or industry-specific data to customize an existing foundation model for their use case and quality requirements.

Quickly set up your data annotation tasks for generative AI

Customers can simply provide a short description of timelines and data labeling requirements for customizing foundation models, and Amazon SageMaker Ground Truth Plus automatically sets up the workflows, labeling interfaces, and a highly skilled workforce on the customer's behalf. Get started today.

Amazon SageMaker Ground Truth Plus

Expert workforce

With Amazon SageMaker Ground Truth Plus, labeling is done by an expert workforce trained on machine learning (ML) tasks that can help meet your data security, privacy, and compliance requirements. For example, if you need people proficient in labeling audio files, you can specify this requirement in the guidelines you provide to SageMaker Ground Truth Plus, and the service will automatically select labelers with those skills.

End-to-end data labeling management

With Amazon SageMaker Ground Truth Plus, you can easily create high-quality training datasets without building labeling applications or managing labeling workforces on your own. You can upload data along with labeling requirements in Amazon S3. Once you have uploaded the data, SageMaker Ground Truth Plus takes care of setting up the data labeling workflows and operating them on your behalf.

ML labeling techniques

Amazon SageMaker Ground Truth Plus uses ML techniques, including active learning, pre-labeling, and machine validation which increases the quality of the output dataset and decreases the data labeling costs. A multi-step labeling workflow includes ML models for active learning that allows Ground Truth Plus to reduce costs by selecting objects (which can be an image, an audio recording, a section of text, etc.) that need to be labeled and ML models to pre-label selected data that reduces human effort. Ground Truth Plus uses machine validation to identify potential errors that are then sent for an additional round of human review. This significantly improves label quality by catching human errors. Additionally, Ground Truth Plus also uses assistive labeling features such as ‘automatic 3D cuboid snapping’, ‘predict-next in video labeling’, and ‘auto-segment’ through an intuitive user interface to reduce the time needed for data labeling tasks while also improving quality.

Interactive dashboards

SageMaker Ground Truth Plus provides interactive dashboards and user interfaces, so you can monitor progress of training datasets across multiple projects, track project metrics such as daily throughput, inspect labels for quality, and provide feedback on the labeled data.

Amazon SageMaker Ground Truth

3D point clouds

Three dimensional (3D) point clouds are most commonly captured using Light Detection and Ranging (LIDAR) devices to generate a 3D understanding of a physical space at a single point in time. SageMaker Ground Truth supports built-in labeling workflows for your 3D point cloud data including object detection, objection tracking, and semantic segmentation.

Object detection

With the object detection workflow, you can identify and label objects of interest within a 3D point cloud. For example, in an autonomous vehicle use case, you can accurately label vehicles, lanes, and pedestrians.

Amazon SageMaker Ground Truth object detection

Object tracking

With the object tracking workflow, you can track the trajectory of objects of interest. For example, an autonomous vehicle needs to track the movement of other vehicles, lanes, and pedestrians. Ground Truth allows you to track the trajectory of these objects across a sequence of 3D point cloud data.

Amazon SageMaker Ground Truth object tracking

Semantic segmentation

With the semantic segmentation workflow, you can segment the points of a 3D point cloud into pre-specified categories. For example, for autonomous vehicles, Ground Truth could categorize the presence of streets, foliage, and structures.

Amazon SageMaker Ground Truth semantic segmentation

Video

SageMaker Ground Truth supports common video labeling use cases with built-in workflows, including video object detection, video object tracking, and video clip classification.

Video object detection

With the video object detection workflow, you can identify objects of interest within a sequence of video frames. For example, in building a perception system for an autonomous vehicle, you can detect other vehicles in the scene around the vehicle.

Amazon SageMaker Ground Truth video object detection

Video object tracking

With the video object tracking workflow, you can track objects of interest across a sequence of video frames. For example, in a sports game use case, you can accurately label players across the duration of a play.

Amazon SageMaker Ground Truth video object tracking

Video clip classification

With the video clip classification workflow, you can classify a video file into a pre-specified category. For example, you can select pre-specified categories that best describe the video such as a sports play or traffic congestion at a busy intersection.

Amazon SageMaker Ground Truth video clip classification

Images

SageMaker Ground Truth provides built-in labeling workflows for your image data, including Image Classification, Object Detection, and Semantic Segmentation.

Image classification

Image Classification is the process of identifying an image based on its real world representation. This process involves categorizing images against a pre-defined set of labels. Image classification is useful for scene detection models that need to consider the full context of the image. For example, we can build an image classification model for autonomous vehicles to detect various real world objects such as other vehicles, pedestrians, traffic lights and signage.

Amazon SageMaker Ground Truth image classification

Object detection

You can use the object detection workflow to identify and label objects of interest (e.g., vehicles, pedestrians, dogs, cats) in images. The labeling task involves drawing a bounding box, a two-dimensional (2D) box, around the objects of interest within an image. Computer vision models trained from images with labeled bounding boxes learn that the pixels within the box correspond to the specified object.

Amazon SageMaker Ground Truth object detection

Semantic segmentation

You can use the semantic segmentation workflow to label the exact parts of an image that correspond to the labels your model needs to learn. It provides high precision training data because the individual pixels are labeled. For example, the irregular shape of a car in an image could be captured exactly with semantic segmentation.

Amazon SageMaker Ground Truth semantic segmentation

Text

SageMaker Ground Truth provides built-in labeling workflows for your text data, including Text Classification and Named Entity Recognition.

Text classification

Text classification involves categorizing text strings against a pre-defined set of labels. Categorizing text into different labels is often used for natural language processing (NLP) models that identify things like topics (e.g., product descriptions, movie reviews) or sentiment.

Amazon SageMaker Ground Truth text classification

Named Entity Recognition

Named Entity (NER) involves sifting through text data to locate phrases called named entities, and categorizing each with a label, such as “person,” “organization,” or “brand.” So, in the statement “I recently subscribed to Amazon Prime,” “Amazon Prime” would be the named entity and could be categorized as a “brand.”

Amazon SageMaker Ground Truth named entity recognition

Custom workflows

You can create your own labeling workflow in Ground Truth. A custom workflow consists of three components: (1) a UI template that provide human labelers with all of the instructions and tools needed to complete the labeling task, (2) any pre-processing logic encapsulated in an AWS Lambda function, and (3) any post-processing logic encapsulated in an AWS Lambda function. A large selection of UI templates is available or you can upload your own Javascript/HTML template. The pre-processing Lambda function can serve the data to be labeled and add any additional context for the labeler and the post-processing Lambda function can be used to insert an accuracy improvement algorithm. The algorithm can assess the quality of the annotations made by the humans or can find consensus on what is “right” when the same data is provided to multiple human labelers. You can upload all three components using the SageMaker Ground Truth console.

Create your custom workflow in Ground Truth

Workforces

SageMaker Ground Truth supports multiple choices for a human workforce to label data, (1) Your own employees, (2) Third party data labeling service providers available through AWS Marketplace, and (3) Crowd sourced workforce through Amazon Mechanical Turk.

Mechanical Turk
Mechanical Turk
Private
Private
Vendors
Vendors

Synthetic data generation

Amazon SageMaker Ground Truth supports synthetic data generation. SageMaker Ground Truth can generate hundreds of thousands of labeled synthetic images, helping to improve labeling accuracy and eliminate the need to manually label data. First, you specify your image requirements or provide 3D assets and baseline images, such as computer-aided design (CAD) images. Next, AWS digital artists create images that imitate pose and placement of objects, and include object or scene variations. In addition, AWS digital artists optionally add specific inclusions to create images that are not often included in training datasets.
Synthetic Data Generation Image
Amazon SageMaker Ground Truth pricing
Learn more about Amazon SageMaker Data Labeling Pricing

Get started with Amazon SageMaker Data Labeling with no upfront commitments or long-term contracts.

Learn more 
Sign up for an AWS account
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building in the console
Start building in the console

Get started building with Amazon SageMaker Data Labeling in the AWS Management Console.

Sign in