Skip to main content

AWS Solutions Library

Guidance for Evaluating Generative AI Applications using OSS on AWS

Overview

This Guidance demonstrates how to implement automated quality assurance pipelines for AI applications on AWS, delivering significant business and technical advantages. By automating prompt evaluation and testing processes, organizations can dramatically reduce manual testing overhead while ensuring consistent, high-quality AI responses. The solution leverages serverless infrastructure and Amazon Bedrock foundation models to enable continuous testing capabilities, quality gates, and version control for AI prompts. This systematic approach helps organizations scale their AI applications more reliably, minimize deployment risks, and maintain rigorous quality standards. The flexible architecture allows customization of evaluation criteria and CI/CD pipelines to match specific organizational needs, making it valuable for businesses at any stage of AI adoption.

Benefits

Deploy a CI/CD pipeline that automates testing and evaluation of AI prompts using Promptfoo and Amazon Bedrock. This approach helps development teams implement test-driven development practices for prompt engineering, reducing manual effort and accelerating delivery.

Implement systematic evaluation of prompts against predefined test cases and datasets before deployment. By leveraging Amazon Bedrock models as both generators and judges, you can objectively measure prompt performance and ensure only high-quality prompts reach production.

Establish a secure, auditable workflow for prompt management with integrated IAM roles and AWS Secrets Manager. This structured approach helps maintain version control, enables proper access management, and creates a traceable history of prompt development and deployment.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs. 

Go to sample code

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages