Amazon SageMaker Ground Truth FAQs
Q: What is human-in-the-loop and why it is important for building AI-powered applications?
Human-in-the-loop is the process of harnessing human input across the ML lifecycle to improve the accuracy and relevancy of models. Humans can perform a variety of tasks, from data generation and annotation, to model review, customization, and evaluation. Human intervention is especially important for generative AI applications, where humans are typically both the requester and consumer of the content. It is therefore critical that humans train foundation models (FMs) how to respond accurately, safely, and relevantly to users’ prompts. Human feedback can be applied to help you complete multiple tasks. First, creating high quality labeled training datasets for generative AI applications via supervised learning (where a human simulates the style, length, and accuracy of how a model should respond to user’s prompts) and reinforcement learning with human feedback (where a human ranks and classifies model responses). Second, using human-generated data to customize FMs on specific tasks or with your company and domain specific data and make model output relevant for you. And lastly, using human evaluation and comparison to select the FM that is best suited for your use case and project requirements.
Q: What is the difference between Amazon SageMaker Ground Truth’s self-service and AWS-managed offerings?
Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities. There are two ways to use Amazon SageMaker Ground Truth, a self-service offering and a AWS-managed offering. In the self-service offering, your data annotators, content creators, and prompt engineers (in-house, vendor-managed, or leveraging the public crowd) can use our low-code user interface to accelerate human-in-the-loop tasks, while having flexibility to build and manage your own custom workflows. In the AWS-managed offering (SageMaker Ground Truth Plus), we handle the heavy lifting for you, which includes selecting and managing the right workforce for your use case. SageMaker Ground Truth Plus designs and customizes an end-to-end workflow (including detailed workforce training and quality assurance steps) and provides a skilled AWS-managed team which is trained on the specific tasks and meets your data quality, security, and compliance requirements.
Q: How can human-in-the-loop capabilities be used for generative AI applications powered by FMs?
Human-in-the-loop capabilities play an important role in creating and improving generative AI applications powered by FMs. A highly skilled human workforce that is trained on the tasks’ guidelines can provide feedback, guidance, inputs, and assessment in activities like generating demonstration data to train FMs, correcting and improving sample responses, fine-tuning a model based on company and industry data, acting as a safeguard against toxicity and bias and more. Human-in-the-loop capabilities, therefore, can improve model accuracy and performance.
Q: How do I get started with Amazon SageMaker Ground Truth?
To get started with Amazon SageMaker Ground Truth Plus (AWS managed offering), please complete the project requirement form. Our team will reach out to you to discuss your human-in-the-loop project.
To get started with Amazon SageMaker Ground Truth (self-service offering), simply sign into the AWS Management Console and navigate to the SageMaker console. From there, select Labeling jobs under Ground Truth. Here you can create a labeling job. First as part of the labeling job creation flow, you provide a pointer to the S3 bucket that contains your dataset to be labeled. Ground Truth offers templates for common labeling tasks where you only need to click a few choices and provide minimal instructions on how to get your data labeled. Alternatively, you can create your own custom template. As the last step of creating a labeling job, you select one of the three human workforce options: (1) a public crowdsourced workforce, (2) a curated set of third party data labeling service providers , or (3) bring your own workers. You also have the option to enable automated data labeling.
Q: How does Amazon SageMaker Ground Truth protect and secure my data?
By default, Amazon SageMaker Ground Truth encrypts data stored in an Amazon S3 bucket at rest and in transit. In addition, access to your data is controlled using AWS Identity and Access Management (IAM). SageMaker Ground Truth does not store or make copies of your data outside of your AWS environment (created by you or through AWS managed service) and your data remains in your control. Further, Ground Truth supports compliance standards such as General Data Protection Regulation (GDPR), as well as logs and audits all access to your data using Amazon CloudWatch and Amazon CloudTrail. Visit the Amazon SageMaker Ground Truth documentation for more information.
Q: How does Amazon SageMaker Ground Truth find the right workforce for my project?
With Amazon SageMaker Ground Truth Plus (AWS managed offering) you can access an expert, on-demand, workforce who is trained on your specific AI/ML tasks, can dynamically scale your workflows up or down based on specific project requirements, and can help meet your data quality, security, and compliance requirements. Our team will work with you to understand the skills your project requires, and staff it with the appropriate workforce.
Q: What is Amazon SageMaker Ground Truth’s cost, and in which AWS regions is it available?
Please see the SageMaker Ground Truth pricing page for the current pricing information. SageMaker Ground Truth Plus projects are priced individually and our team will review pricing options with you after you submit a project requirement form.
The AWS Region Table lists all the AWS regions where Amazon SageMaker Ground Truth is currently available.