Posted On: Dec 1, 2021

Today, we are excited to announce the general availability of Amazon SageMaker Ground Truth Plus, a new turnkey data labeling servicethat enables you to create high-quality training datasets quickly and reduces costs by up to 40%.

To train a machine learning (ML) model, data scientists need large, high-quality, labeled datasets. As ML adoption grows, labeling needs increase. This forces data scientists to spend weeks on building data labeling workflows and managing a data labeling workforce. Unfortunately, this slows down innovation and increases cost. To ensure data scientists can spend their time building, training, and deploying ML models, data scientists typically task other in-house teams consisting of data operations managers and program managers to produce high-quality training datasets. However, these teams typically don't have access to skills required to deliver high-quality training datasets, which affects ML results. What if you could rely on a turnkey service that enables you to create high-quality training datasets at scale without consuming your in-house resources? Enter Amazon SageMaker Ground Truth Plus.

Amazon SageMaker Ground Truth Plus makes it easy for data scientists as well as business managers, such as data operations managers and program managers, to create high-quality training datasets by removing the undifferentiated heavy lifting associated with building data labeling applications and managing the labeling workforce. All you do is share data along with labeling requirements and Ground Truth Plus sets up and manages your data labeling workflow, based on these requirements. From there, an expert workforce that is trained on a variety of ML tasks performs data labeling. You don't even need deep ML expertise or knowledge of workflow design and quality management to use Ground Truth Plus.

Ground Truth Plus uses ML techniques, including active-learning, pre-labeling, and machine validation. This increases the quality of the output dataset and decreases the data labeling costs. Ground Truth Plus provides transparency into your data labeling operations and quality management. With it, you can review the progress of training datasets across multiple projects, track project metrics, such as daily throughput, inspect labels for quality, and provide feedback on the labeled data. Ground Truth Plus can be used for a variety of use cases, including computer vision, natural language processing, and speech recognition.

Amazon SageMaker Ground Truth Plus is generally available today in the US East (N. Virginia) AWS Region. To learn more about Amazon SageMaker Ground Truth Plus, read the blog post, refer to Ground Truth Plus documentation, and visit the SageMaker data labeling webpage or visit the Ground Truth Plus console to get started.