Guidance for Generative AI Deployments using Amazon SageMaker JumpStart
Overview
How it works
These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.
Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
This Guidance uses SageMaker endpoints, Lambda, CodePipeline, CodeCommit, and AWS CDK to enhance operational excellence. SageMaker provides standard logging and metrics for monitoring and analyzing the performance of deployed machine learning models, helping users gain insights into operational health and make data-driven decisions for continuous improvement. Lambda offers built-in logging and metrics capabilities, and we utilize Lambda Powertools to ensure consistent logging with the Lambda functions. CodePipeline and CodeCommit allow you to deploy changes as code for repeatability, consistency, and traceability with rollbacks and controlled change management to minimize disruptions and errors. Finally, the infrastructure-as-code approach with AWS CDK accelerates cloud development using common programming languages to model applications. These services help users with better visibility, troubleshooting, and understanding the behavior of their functions.
Security
The services selected for this Guidance, coupled with the security measures integrated within this Guidance, support the goals of maintaining a secure environment, protecting sensitive data, and adhering to security best practices. AWS Identity and Access Management (IAM) helps enhance security by enabling you to manage user identities, roles, and permissions, ensuring that users only have the necessary access to AWS resources. The Amazon S3 bucket used in the codebase is encrypted by default, helping maintain the confidentiality and integrity of the stored data. The Amazon SNS topics only accept encrypted communication, ensuring secure transmission of messages.
The permissions for accessing resources on other resources, such as Amazon S3 buckets, are set up according to AWS CDK standards. The code follows the principle of least-privilege, granting only the minimum level of access required for specific operations, helping reduce the attack surface area and mitigate the potential impact of any compromised credentials.
These measures help protect sensitive information and mitigate risks associated with unauthorized access or data breaches.
Reliability
This Guidance enhances reliability through scalable resources, efficient troubleshooting, automated deployments, and leveraging standard AWS functionality, promoting a stable and high-performing infrastructure that handles varying workloads and reduces downtime.
The services used to enhance reliability in this Guidance include SageMaker, Lambda, Amazon SNS, Amazon S3, and AWS CDK pipelines. SageMaker provides scalability for the asynchronous endpoint, allowing users to configure the maximum count of instances to meet their specific needs. Lambda functions utilize Lambda Powertools to ensure proper logging format, enhancing troubleshooting capabilities, and reducing mean time to resolution. Using Amazon SNS and Amazon S3 for logging and storage, you can capture and store data reliably, supporting compliance requirements and reliable operations. The AWS CDK pipelines, CodeCommit, and CodePipeline enable automated and controlled deployments, ensure consistency, reducing the risk of errors, and offer rollbacks for a reliable architecture.
Performance Efficiency
SageMaker, Lambda, Amazon SNS, and Amazon S3 are used to support optimal model hosting, serverless orchestration, efficient storage, flexibility, and adaptability. This leads to cost-effective scaling, reduced latency, and improved overall performance. SageMaker provides the best option for hosting a machine learning model, with the asynchronous endpoints autoscaling functionality offering efficient inference and scalability, dynamically adjusting resources based on demand. Lambda and Amazon SNS are used to orchestrate the logic in a serverless, scalable, and cost-effective manner, helping you avoid manual infrastructure management for improved performance efficiency. Amazon S3 is utilized as a storage solution due to its purpose, performance, and features like built-in object lifecycle management that optimizes data access for image generation and reduced latency.
The TOML configuration file used in the AWS CDK project allows for easy adjustment of deployment parameters, enabling rapid redeployment and resource calibration for specific performance requirements.
Cost Optimization
SageMaker, with the auto-scaling endpoint, Amazon SNS, Lambda, and Amazon S3 help you optimize costs by minimizing idle resources, manage storage, and leverage pay-per-use services, so you can accurately estimate expenses and maximize the value obtained from AWS. The asynchronous SageMaker endpoint is set up with a scalable target, which ensures that no compute resources are permanently running when not in use. Lambda functions handle the conversion process and are pay-per-use. Amazon SNS is also a pay-per-use service, charging only for completion and message deliveries. Amazon S3 offers scalable, pay-as-you-go pricing where you can configure the Amazon S3 Lifecycle policy to remove older objects that are no longer of interest and reduce ongoing costs.
Sustainability
Lambda, Amazon SNS, and Amazon S3 are serverless services and support on-demand resource consumption, with a scalable and flexible architecture and high resource utilization. Serverless services activate resources only when needed, reducing energy consumption and carbon footprint. Dynamic resource allocation optimizes utilization and sustainability. The SageMaker asynchronous endpoint is set up as a scalable target with a minimum instance count of 0, allowing for flexible scaling. By utilizing these services, you can minimize the overall resource consumption and maximize resource utilization, leading to improved sustainability.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages