Amazon SageMaker Ground Truth Plus
With Amazon SageMaker Ground Truth Plus, labeling is done by an expert workforce trained on machine learning (ML) tasks that can help meet your data security, privacy, and compliance requirements. For example, if you need people proficient in labeling audio files, you can specify this requirement in the guidelines you provide to SageMaker Ground Truth Plus, and the service will automatically select labelers with those skills.
End-to-end data labeling management
With Amazon SageMaker Ground Truth Plus, you can easily create high-quality training datasets without building labeling applications or managing labeling workforces on your own. You can upload data along with labeling requirements in Amazon S3. Once you have uploaded the data, SageMaker Ground Truth Plus takes care of setting up the data labeling workflows and operating them on your behalf.
ML labeling techniques
Amazon SageMaker Ground Truth Plus uses ML techniques, including active learning, pre-labeling, and machine validation which increases the quality of the output dataset and decreases the data labeling costs. A multi-step labeling workflow includes ML models for active learning that allows Ground Truth Plus to reduce costs by selecting objects (which can be an image, an audio recording, a section of text, etc.) that need to be labeled and ML models to pre-label selected data that reduces human effort. Ground Truth Plus uses machine validation to identify potential errors that are then sent for an additional round of human review. This significantly improves label quality by catching human errors. Additionally, Ground Truth Plus also uses assistive labeling features such as ‘automatic 3D cuboid snapping’, ‘predict-next in video labeling’, and ‘auto-segment’ through an intuitive user interface to reduce the time needed for data labeling tasks while also improving quality.
SageMaker Ground Truth Plus provides interactive dashboards and user interfaces, so you can monitor progress of training datasets across multiple projects, track project metrics such as daily throughput, inspect labels for quality, and provide feedback on the labeled data.
Amazon SageMaker Ground Truth
3D point clouds
Three dimensional (3D) point clouds are most commonly captured using Light Detection and Ranging (LIDAR) devices to generate a 3D understanding of a physical space at a single point in time. SageMaker Ground Truth supports built-in labeling workflows for your 3D point cloud data including object detection, objection tracking, and semantic segmentation.
With the object detection workflow, you can identify and label objects of interest within a 3D point cloud. For example, in an autonomous vehicle use case, you can accurately label vehicles, lanes, and pedestrians.
With the object tracking workflow, you can track the trajectory of objects of interest. For example, an autonomous vehicle needs to track the movement of other vehicles, lanes, and pedestrians. Ground Truth allows you to track the trajectory of these objects across a sequence of 3D point cloud data.
With the semantic segmentation workflow, you can segment the points of a 3D point cloud into pre-specified categories. For example, for autonomous vehicles, Ground Truth could categorize the presence of streets, foliage, and structures.
SageMaker Ground Truth supports common video labeling use cases with built-in workflows, including video object detection, video object tracking, and video clip classification.
Video object detection
With the video object detection workflow, you can identify objects of interest within a sequence of video frames. For example, in building a perception system for an autonomous vehicle, you can detect other vehicles in the scene around the vehicle.
Video object tracking
With the video object tracking workflow, you can track objects of interest across a sequence of video frames. For example, in a sports game use case, you can accurately label players across the duration of a play.
Video clip classification
With the video clip classification workflow, you can classify a video file into a pre-specified category. For example, you can select pre-specified categories that best describe the video such as a sports play or traffic congestion at a busy intersection.
SageMaker Ground Truth provides built-in labeling workflows for your image data, including Image Classification, Object Detection, and Semantic Segmentation.
Image Classification is the process of identifying an image based on its real world representation. This process involves categorizing images against a pre-defined set of labels. Image classification is useful for scene detection models that need to consider the full context of the image. For example, we can build an image classification model for autonomous vehicles to detect various real world objects such as other vehicles, pedestrians, traffic lights and signage.
You can use the object detection workflow to identify and label objects of interest (e.g., vehicles, pedestrians, dogs, cats) in images. The labeling task involves drawing a bounding box, a two-dimensional (2D) box, around the objects of interest within an image. Computer vision models trained from images with labeled bounding boxes learn that the pixels within the box correspond to the specified object.
You can use the semantic segmentation workflow to label the exact parts of an image that correspond to the labels your model needs to learn. It provides high precision training data because the individual pixels are labeled. For example, the irregular shape of a car in an image could be captured exactly with semantic segmentation.
SageMaker Ground Truth provides built-in labeling workflows for your text data, including Text Classification and Named Entity Recognition.
Text classification involves categorizing text strings against a pre-defined set of labels. Categorizing text into different labels is often used for natural language processing (NLP) models that identify things like topics (e.g., product descriptions, movie reviews) or sentiment.
Named Entity Recognition
Named Entity (NER) involves sifting through text data to locate phrases called named entities, and categorizing each with a label, such as “person,” “organization,” or “brand.” So, in the statement “I recently subscribed to Amazon Prime,” “Amazon Prime” would be the named entity and could be categorized as a “brand.”
SageMaker Ground Truth supports multiple choices for a human workforce to label data, (1) Your own employees, (2) Third party data labeling service providers available through AWS Marketplace, and (3) Crowd sourced workforce through Amazon Mechanical Turk.
Synthetic data generation
Get started with Amazon SageMaker Data Labeling with no upfront commitments or long-term contracts.
Instantly get access to the AWS Free Tier.
Get started building with Amazon SageMaker Data Labeling in the AWS Management Console.