Posted On: Nov 29, 2023
Model Evaluation on Amazon Bedrock allows you to evaluate, compare, and select the best foundation models for your use case. Amazon Bedrock offers a choice of automatic evaluation and human evaluation. You can use automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. For subjective or custom metrics, such as friendliness, style, and alignment to brand voice, you can set up a human evaluation workflow with a few clicks. Human evaluation workflows can leverage your own employees or an AWS-managed team as reviewers. Model evaluation provides built-in curated datasets or you can bring your own datasets.
Amazon Bedrock’s interactive interface guides you through model evaluation. You simply choose human or automatic evaluation, select the task type and metrics, and upload prompt datasets. Amazon Bedrock then runs evaluations and generates a report, so you can easily understand how the model performed against the metrics you selected, and choose the right one for your use case.
Model Evaluation on Amazon Bedrock is available in preview in AWS Regions US East (N. Virginia) and US West (Oregon). For more information, see the AWS Region table.
To learn more about Model Evaluation on Amazon Bedrock, see the Amazon Bedrock developer experience web page. To get started, sign in to Amazon Bedrock on the AWS Management Console.