Optimize Production Workloads with New Resources in the Generative AI Center of Excellence for AWS Partners

By Jacob Newton-Gladstein, Generative AI Center of Excellence Global Field Enablement Lead – AWS

In 2023, generative AI captured the public’s imagination, and 2024 is shaping up to be the year when generative AI workloads move from proof-of-concept to production. As these workloads make their way into production environments, cost and performance optimization become crucial, just like any other technology.

In the current generative AI landscape, it’s no longer enough to simply deploy in production. Companies must focus on building generative AI solutions that deliver optimal end-user performance at the lowest possible cost to win over customers’ workloads.

“Amazon Bedrock gives us the ability to switch between models with ease, helping us to experiment with those different models on the fly,” says Tom Bomer, Data Scientist at Firemind. “However, equally important as model choice is model optimization. Making sure the model you are using is being used with the most efficient supporting architecture leading to improved cost performance and query accuracy. Having the right process and framework for optimizing our choice of large language models allows us to win more customer workloads.”

At the Generative AI Center of Excellence for AWS Partners, we listened to feedback from customers and partners. We then collaborated with technical and business leaders across Amazon Web Services (AWS) to create a valuable resource designed to help AWS Partners transform their generative AI solutions and offerings from good to great.

Today, we’re excited to announce the inaugural assets for each of these workstreams in the Generative AI Center of Excellence, and we look forward to rapidly building them out over the coming months to help our AWS Partners continue to create world-class generative AI solutions.

The first resource is the State of Data and AI: Emerging Trends in Generative AI whitepaper. This comprehensive paper, authored by AWS Generative AI Solutions Architects with extensive domain expertise, provides strategic guidance for partner business and technology leaders to accelerate the development of generative AI solutions on AWS. Leverage this report to understand the art of the possible, gain insights into adoptable AWS solutions for customers in different contexts, and strategically prepare for what’s around the corner.

One of the biggest learnings from compiling the “State of Data and AI” is that in today’s generative AI landscape, Foundation Model Ops, effective model selection, model benchmarking, and effective model steering through prompt engineering or retrieval-augmented generation (RAG) workflows are major drivers of generative AI deployment costs and end-user satisfaction. The goal is for the “State of Data and AI” to empower you as you help your customers prioritize investment decisions in evolving data and AI technologies.

Through this report, we present best practice patterns for industry use cases, fill knowledge gaps in building with evolving data and AI technology, and discuss anticipated trends within a 3-month horizon while positioning AWS solutions to enable them.

We’re also building on this initial report by creating two new learning paths within the Generative AI Center of Excellence: Prompt Engineering Academy and the Model Portability Optimization and Performance Framework (login required).

The Prompt Engineering Academy contains both technical and non-technical content that guides AWS Partners on strategies to extend the functionality of any large language model, improve retrieval cost, and enhance response accuracy by employing various prompt engineering strategies. We will discuss topics such as simple prompt engineering, prompt chaining, and how to use and optimize Retrieval Augmented Generation (RAG) within a generative AI deployment.

Figure 1 – Value of prompt engineering

The Model Portability Optimization and Performance Framework will cover topics such as AI Optimized Architecture, Model Security and Responsibility, Cost-Effective Inferential and Trainium Architecture, and AI at the Edge: An overview of the edge computing landscape for generative AI. The goal of this framework is to share a pluggable FMLops architecture based on input from AWS generative AI experts that AWS Partners can use at scale in their own generative AI solutions and offerings. This framework can either be utilized by builders looking to adopt best practices or sellers looking to make sure they offer the lowest price and highest performance solution to clients.

Figure 2 – Model Portability Optimization and Performance Framework

To learn how AWS Partners are building their own prompt engineering frameworks on AWS, register for our upcoming AWS Marketplace Spotlight Series: Key prompt engineering strategies to balance cost, performance, and accuracy. We’ll discuss related topics with speakers from Rackspace Technology, an AWS Premier Tier Partner and AWS Generative AI Competency Launch Partner.

Finally, be sure to leverage the Generative AI Center of Excellence for AWS Partners in AWS Partner Central for resources to support building and executing your Generative AI strategy (login required). Navigate to Resources > Guides > Generative AI Center of Excellence.

AWS Partner Network (APN) Blog

Optimize Production Workloads with New Resources in the Generative AI Center of Excellence for AWS Partners

Resources

Follow

Learn

Resources

Developers

Help