This Guidance demonstrates how you can integrate Stable Diffusion from Stability AI with Amazon SageMaker to build and scale generative artificial intelligence (AI) applications. It makes it possible for you to decouple interdependent, monolithic generative AI applications that are often restrictive and time-consuming to modify, and implement automatic scaling and expansion for tasks like image inferences and model training. With enhanced platform management functions, such as resource access control, and API support for backend integration, you can use this Guidance to adapt generative AI to the specific needs of your organization. 

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • AWS CloudFormation is a service that helps you automate, test, and deploy infrastructure as code templates with continuous integration and continuous delivery (CI/CD) automations. Amazon CloudWatch, where you can use CloudWatch Logs, allows you to monitor, store, and access log files from Lambda and SageMaker, helping you record requests and visualize the state of your underlying services. Monitoring and storing log files by CloudWatch Logs helps you analyze and troubleshoot requests quickly.

    You can use versioning in Lambda to save your function's code and configuration as you develop it. Together with aliases, you can use versioning to perform blue/green and rolling deployments. Additionally, by using CloudFormation, you have production environments with templates, sandbox development capabilities, and test environments for increasing levels of operations control.

    Read the Operational Excellence whitepaper 
  • API Gateway uses a resource policy to control whether a specified principal, typically an AWS Identity and Access Management (IAM) role or group, can invoke the API. All IAM policies are scoped down to the minimum permissions required for Lambda and SageMaker to function properly. By scoping API Gateway resources and IAM policies to the minimum permissions required, you limit unauthorized access to applications and resources.

    Read the Security whitepaper 
  • Lambda runs functions in multiple Availability Zones to ensure that it is available to process events in case of a service interruption in a single zone. Also, Lambda automatically retries an error with delays between retries.

    Amazon S3 provides 99.999999999% (11 nines) durability and 99.99% availability of objects over a given year, which can help you store model and data resources with high reliability.

    API Gateway sets a limit on a steady-state rate and a burst of request submissions against all APIs in your account. You can configure custom throttling for your APIs. By limiting the number of requests per second or per minute, you can prevent your backend systems from being overwhelmed and maintain the reliability of your API.

    Lastly, SageMaker combined with Amazon S3 helps support your data resiliency and backup needs. SageMaker takes care of the underlying infrastructure required for training and deploying machine learning (ML) models, while AWS manages the compute instances, storage, and networking components, ensuring high availability and resilience.

    Read the Reliability whitepaper 
  • Lambda is engineered to provide managed scaling automatically. When your function receives a request while it's processing a previous request, Lambda launches another instance of your function to handle the increased load. As traffic increases, Lambda increases the number of concurrent executions of your functions.

    Read the Performance Efficiency whitepaper 
  • Lambda uses a pay-per-use billing model, where you are billed only for the time your functions are running.

    SageMaker manages your ML infrastructure by automatically provisioning and scaling compute resources according to workload requirements.

    Read the Cost Optimization whitepaper 
  • Lambda is a serverless computing service, which means you don't have to provision or manage servers. It automatically scales your code in response to incoming events, and you only pay for the compute time used. This serverless architecture eliminates the need for idle servers, resulting in reduced energy consumption compared to traditional server-based architectures.

    With Lambda, you can optimize the utilization of computing resources, allowing you to break down your application into individual functions that can be independently scaled. This fine-grained scaling enables efficient resource allocation, as you only allocate resources to specific functions when they are actively processing requests. It eliminates the need to over-provision resources, leading to better resource utilization and reduced waste.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

[Content Type]


This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?