AWS Solutions Library

AWS Solutions Library›
Guidance for Text Generation using Embeddings from Enterprise Data on AWS

Guidance for Text Generation using Embeddings from Enterprise Data on AWS

Go to sample code

Overview

This Guidance demonstrates question answering using Retrieval Augmented Generation (RAG) with foundation models in Amazon SageMaker JumpStart. Generative AI is powered by large language models (LLMs), commonly referred to as foundation models, that are pre-trained on vast amounts of data. This Guidance shows how to solve a question answering task with Amazon SageMaker LLMs and embedding endpoints so you can build models that generate text based on specific, enterprise data rather than generic data. This can help you automate tasks, enhance your applications, and improve information retrieval.

How it works

This architecture diagram shows a secure, generative AI-based application that generates text from enterprise data.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

The services in this Guidance collectively support operational excellence by automating tasks, improving security, enhancing scalability, and streamlining management and operations of the generative AI application. For example, SageMaker JumpStart simplifies machine learning (ML) model deployment, API Gateway provides secure and scalable API access, Lambda automates processing and response formatting, OpenSearch Service improves data retrieval, and Fargate automates resource provisioning for indexing jobs.

Read the Operational Excellence whitepaper

Amazon Cognito helps ensure that only authenticated and authorized users can access the application. It manages user identities through multi-factor authentication (MFA) options. Amazon Virtual Private Cloud (Amazon VPC) isolates resources, such as SageMaker endpoints and Lambda functions, within a private network. This isolation protects communication between components of the application, enhancing data privacy and security. Amazon VPC also allows for the implementation of network security measures, such as security groups and network access control lists (NACLs). These services help you safeguard sensitive data and maintain the confidentiality, integrity, and availability of the application.

Read the Security whitepaper

SageMaker JumpStart simplifies the deployment and management of ML models, including model versioning and monitoring. This simplification reduces the risk of model deployment errors and helps ensure that models are consistently available and reliable for inference. Additionally, Lambda functions process user input and invoke SageMaker endpoints. Lambda is serverless and automatically handles scaling and availability so that the application can reliably process user requests without the need for manual scaling or managing servers.

Fargate initiates indexing jobs for embeddings and automates resource provisioning and container management, so that indexing jobs are completed reliably and at scale. This automation reduces the risk of resource limitations or failures during indexing processes.

Read the Reliability whitepaper

In a generative AI application where tasks may involve complex ML inference, data processing, and retrieval, efficiency is crucial to delivering a responsive and high-performing user experience. By using SageMaker JumpStart, Lambda, OpenSearch Service, and Fargate, this Guidance efficiently manages workloads, enables quick response times, and scales to meet performance demands, ultimately enhancing the user's experience with improved application responsiveness and efficiency.

SageMaker JumpStart optimizes model deployment and monitoring so that ML inferences are initiated efficiently, leading to faster response times and better performance for users. Lambda functions automatically scale to handle concurrent requests so the application can maintain performance efficiency, even during periods of high user demand. OpenSearch Service indexes and searches embeddings, enhancing the application's information retrieval capabilities and enabling users to quickly access the information they need. Fargate invokes indexing jobs for embeddings. It automates resource provisioning, allowing the application to efficiently process and index large amounts of data without manual intervention.

Read the Performance Efficiency whitepaper

SageMaker JumpStart provides pre-built ML models and workflows, reducing the time and resources required to develop and train models from scratch. This can lead to cost savings by accelerating the development cycle. Lambda follows a pay-as-you-go pricing model, meaning you only pay for the compute time used when your function is invoked. OpenSearch Service allows you to easily scale your cluster based on your search and analytics workloads. You can optimize costs by adjusting the resources to match your actual usage. Fargate automatically manages the underlying infrastructure, which means you don't need to provision or manage servers. This eliminates the need to pay for unused server capacity, resulting in cost savings.

Read the Cost Optimization whitepaper

Services such as Lambda, SageMaker, and Fargate contribute to sustainability by optimizing resource usage. They automatically scale resources based on workload demand, reducing unnecessary energy consumption during periods of low activity. For example, as a serverless compute infrastructure, Fargate runs containerized application workloads and minimizes your overall resource footprint. Similarly, SageMaker JumpStart helps in preventing idle overprovisioned resources by automatically adjusting computing resources to match workload needs.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Text Generation using Embeddings from Enterprise Data on AWS

Overview

How it works

Well-Architected Pillars

Implementation Resources

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Guidance for Text Generation using Embeddings from Enterprise Data on AWS

Overview

How it works

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Implementation Resources

Related Content

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help