Guidance for Text Generation using Embeddings from Enterprise Data on AWS
Overview
How it works
This architecture diagram shows a secure, generative AI-based application that generates text from enterprise data.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
The services in this Guidance collectively support operational excellence by automating tasks, improving security, enhancing scalability, and streamlining management and operations of the generative AI application. For example, SageMaker JumpStart simplifies machine learning (ML) model deployment, API Gateway provides secure and scalable API access, Lambda automates processing and response formatting, OpenSearch Service improves data retrieval, and Fargate automates resource provisioning for indexing jobs.
Security
Amazon Cognito helps ensure that only authenticated and authorized users can access the application. It manages user identities through multi-factor authentication (MFA) options. Amazon Virtual Private Cloud (Amazon VPC) isolates resources, such as SageMaker endpoints and Lambda functions, within a private network. This isolation protects communication between components of the application, enhancing data privacy and security. Amazon VPC also allows for the implementation of network security measures, such as security groups and network access control lists (NACLs). These services help you safeguard sensitive data and maintain the confidentiality, integrity, and availability of the application.
Reliability
SageMaker JumpStart simplifies the deployment and management of ML models, including model versioning and monitoring. This simplification reduces the risk of model deployment errors and helps ensure that models are consistently available and reliable for inference. Additionally, Lambda functions process user input and invoke SageMaker endpoints. Lambda is serverless and automatically handles scaling and availability so that the application can reliably process user requests without the need for manual scaling or managing servers.
Fargate initiates indexing jobs for embeddings and automates resource provisioning and container management, so that indexing jobs are completed reliably and at scale. This automation reduces the risk of resource limitations or failures during indexing processes.
Performance Efficiency
In a generative AI application where tasks may involve complex ML inference, data processing, and retrieval, efficiency is crucial to delivering a responsive and high-performing user experience. By using SageMaker JumpStart, Lambda, OpenSearch Service, and Fargate, this Guidance efficiently manages workloads, enables quick response times, and scales to meet performance demands, ultimately enhancing the user's experience with improved application responsiveness and efficiency.
SageMaker JumpStart optimizes model deployment and monitoring so that ML inferences are initiated efficiently, leading to faster response times and better performance for users. Lambda functions automatically scale to handle concurrent requests so the application can maintain performance efficiency, even during periods of high user demand. OpenSearch Service indexes and searches embeddings, enhancing the application's information retrieval capabilities and enabling users to quickly access the information they need. Fargate invokes indexing jobs for embeddings. It automates resource provisioning, allowing the application to efficiently process and index large amounts of data without manual intervention.
Cost Optimization
SageMaker JumpStart provides pre-built ML models and workflows, reducing the time and resources required to develop and train models from scratch. This can lead to cost savings by accelerating the development cycle. Lambda follows a pay-as-you-go pricing model, meaning you only pay for the compute time used when your function is invoked. OpenSearch Service allows you to easily scale your cluster based on your search and analytics workloads. You can optimize costs by adjusting the resources to match your actual usage. Fargate automatically manages the underlying infrastructure, which means you don't need to provision or manage servers. This eliminates the need to pay for unused server capacity, resulting in cost savings.
Sustainability
Services such as Lambda, SageMaker, and Fargate contribute to sustainability by optimizing resource usage. They automatically scale resources based on workload demand, reducing unnecessary energy consumption during periods of low activity. For example, as a serverless compute infrastructure, Fargate runs containerized application workloads and minimizes your overall resource footprint. Similarly, SageMaker JumpStart helps in preventing idle overprovisioned resources by automatically adjusting computing resources to match workload needs.
Implementation Resources
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages