An elastic deployment of Stable Diffusion with Discord on AWS

Stable Diffusion is a state-of-the-art text-to-image model that generates images from text. Deploying text-to-image models such as Stable Diffusion can be difficult. Currently, Stable Diffusion requires specific computer hardware known as graphical processing units (GPUs). You can lower the bar to entry by offloading the text-to-image generation onto Amazon Web Services (AWS).

Discord is a popular voice, video, and text communication service. It provides a user interface that people can use to make text-to-image requests. When deployed, all members of a Discord server can create images by using Discord Slash Commands.

In this post, we discuss how to deploy a highly available solution on AWS. This solution will perform text-to-image generation with Stable Diffusion and use Discord as the user interface.

Solution architecture

Many of the services selected are serverless, which will offer many benefits. At the time of writing, Stable Diffusion requires a GPU for inference. Amazon Elastic Compute Cloud (Amazon EC2) was selected because it provides GPUs. The solution architecture is shown in the Figure 1.

Figure 1. Solution architecture diagram

Let us walk through the architecture of this solution.

Auto scaling with custom metrics

To properly scale the system, a custom Amazon CloudWatch metric is created. This custom CloudWatch metric calculates the number of Amazon Elastic Container Service (Amazon ECS) tasks required to adequately handle the amount of Amazon Simple Queue Service (Amazon SQS) messages. You should have a high-resolution CloudWatch metric to scale up quickly. For this use case, a high-resolution CloudWatch metric of every 10 seconds was implemented.

Next, let’s create the custom CW metric. Amazon EventBridge rules provide a serverless solution for starting actions on a schedule. Here we use an Amazon EventBridge rule, which initiates an AWS Step Function Express Workflow every minute. With the Express Workflow, we can create serverless workflows that take less than five minutes, which helps us avoid long running AWS Lambda functions. The Express Workflow runs a Lambda function every 10 seconds over a one-minute period, which generates the custom CloudWatch metric.

Two high-resolution CloudWatch alarms scale the system up and down, and are initiated by the custom CloudWatch metric. One CloudWatch alarm increases the ECS tasks and EC2 machines. The other alarm decreases the ECS tasks and EC2 machines.

Handling Discord requests

Someone on Discord sends a request. The Amazon API Gateway HTTP API receives the request and passes the information to an AWS Lambda function. The HTTP API provides a cost-effective option compared to REST APIs and provides tools for authentication and authorization. The HTTP API uses cross-origin resource sharing (CORS), which provides security because it only allows discord.com as an origin.

The AWS Lambda function provides a serverless solution for responding to the HTTP API requests. It transforms the HTTP API request and sends a message to the SQS First-In-First-Out (FIFO) queue. SQS seamelessly decouples the architecture between user requests and backend processing. A FIFO queue ensures that user requests are processed in the order they were requested. The AWS Lambda function sends a response back to the HTTP API within three seconds, which is a requirement of Discord Slash Commands.

When scaling up, an EC2 instance is registered with the ECS cluster. The EC2 instance type was selected because it provides GPU instances. ECS provides a repeatable solution to deploy the container across a variety of instance types. This solution currently only uses the g4dn.xlarge instance type. The ECS service will then place an ECS task onto the eligible EC2 instance. The ECS task will use the Amazon Elastic Container Registry (Amazon ECR) private registry to pull the image, perform text-to-image processing, and respond to the Discord request. The ECR private registry is a managed container registry that manages the image.

Once there is an ECS task running on an Amazon EC2 instance, the ECS task will consume messages from the queue using long polling. This reduces the amount of ReceiveMessage requests the ECS task needs to send. When the ECS task receives a message from the queue, it will then processes the request.

Estimated monthly cost

The example assumes 1,000 requests per month and each request takes 16 seconds to complete. Extra EC2 time was added for the time to begin processing messages (seven minutes) and auto scaling cooldown time (30 minutes). You can adjust the pricing calculations with the AWS Pricing Calculator to reflect your usage and estimated cost.

Prerequisites

This blog assumes familiarity with Terraform, Docker, Discord, Amazon EC2, Amazon Elastic Block Store (Amazon EBS), Amazon Virtual Private Cloud (Amazon VPC), AWS Identity and Access Management (IAM), Amazon API Gateway, AWS Lambda, Amazon SQS, Amazon Elastic Container Registry (Amazon ECR), Amazon ECS, Amazon EventBridge, AWS Step Functions, and Amazon CloudWatch.

For this walkthrough, you should have the following prerequisites:

Access to an AWS account, with permissions to create the resources described in the installation steps section
A virtual private cloud (VPC) with public subnets that is associated with an internet gateway in the region you are deploying into
We suggest using the default VPC. The subnets will need the tag of key: Tier and value: Public and be attached to the VPC. If you decided to create your own VPC with subnets, make sure that auto-assign IP settings is enabled.
An IAM user with the required permissions to deploy the infrastructure
A new Discord application that is registered to a Discord server you own with the scope applications.command. Use this tutorial if you need a starting point on creating a Discord application.
- Discord Bot token
- Discord Application ID
- Discord Public Key
A Hugging Face account
A computer with the following packages installed:
- Terraform >=1.3.6
- git
- AWS CLI
- docker

Walkthrough

Complete the following steps to deploy this solution on AWS.

Increase EC2 limits

This solution uses the g4dn.xlarge instance type, which might require you to request an EC2 limit increase. Check your current limit of Running On-Demand All G and VT instances. Make sure you have more than 4 vCPU; a single g4dn.xlarge requires 4 vCPU. We suggest requesting 8 vCPU so that you can access 2 g4dn.xlarge instances.

Deploy the infrastructure

Ensure you have at least 60 GB of storage available and you’re running on a 64-bit x86 architecture system.
Open a command line on the machine you will be deploying from.
Log in as your AWS user through the AWS CLI with the command aws configure. If you are using an EC2 instance, create and use an instance profile rather than using the AWS CLI.
The region you select will be the one you will deploy into.
Clone the Terraform repository:
git clone https://github.com/aws-samples/amazon-scalable-infra-discord-diffusion.git
Navigate into the Terraform repository:
cd amazon-scalable-infra-discord-diffusion
Customize the variables in terraform.tfvars to match your deployment.
Export the following secrets to the command line:
- export TF_VAR_discord_bot_secret='DISCORD_BOT_SECRET_HERE'
- export TF_VAR_huggingface_password='HUGGINGFACE_PASSWORD_HERE'
Initialize the repository:
terraform init
Apply the infrastructure (this takes about 2 minutes):
terraform apply
Save the outputs for future use.

Set up Discord

This setup adds the Discord interactions URL to your Discord application. After terraform apply comes back successfully, move onto these steps.

Open Discord Application Page -> General Information.
Copy and paste the value from discord_interactions_endpoint_url into the Interactions Endpoint URL, and then save the changes.

If successful, there should be a green box with All your edits have been carefully recorded.

Docker image and Amazon Elastic Container Registry

In this section, you will create a docker image with the Stable Diffusion model.

Exit the terraform repository:
cd ..
Clone the Docker build repository:
git clone https://github.com/aws-samples/amazon-scalable-discord-diffusion.git
Navigate to the Docker repository:
cd amazon-scalable-discord-diffusion
Build and push the docker image to ECR. This requires docker to be installed on the machine and actively running.
You can find the commands for your deployment from the Amazon ECR repository.

Figure 2. View push commands for Amazon ECR

This is a large image (10GB) and can take over 20 minutes to push depending on your machine’s internet connection.

Request an image with Discord Slash Commands

This section will describe how to request a text to image response with Discord.

Log in to Discord and navigate to the server with your Discord application deployed.
Navigate to a text channel.
Type the command /sparkle.
A box with COMMANDS MATCHING /sparkle will appear. Select the /sparkle command box.
Depending on how you customized your Discord Application, the avatar image shown in Figure 3 might be different from what you have.

Figure 3. Writing a Discord Slash Command
Type in a prompt such as a corgi, style of monet.
A response from YourBotName should appear with the response Submitted to Sparkle: YourPromptHere, as shown in Figure 4.

Figure 4. First response from AWS Lambda function

It will take 10 minutes for an EC2 instance to start with an ECS Task running on the instance. Once an ECS Task is running on the instance, inference times should reduce to under 30 seconds, depending on the request.
When an ECS Task is running your request, you will see a Processing your Sparkle message, as shown in Figure 5.

Figure 5. Amazon ECS task processing a request

The message is complete when it says Completed your Sparkle! as shown in Figure 6.

Figure 6. Amazon ECS task returning the final response

Cleaning up

To avoid incurring future charges, delete the resources created by the Terraform script.

Return to the directory where you deployed your terraform script.
To destroy the infrastructure in AWS, run the command terraform destroy.
When prompted to confirm that you want to destroy the infrastructure, type yes and press Enter.

Conclusion

In summary, we created a solution that allows members of a Discord server to create images from text with a Stable Diffusion model. With this implementation, the deployment can scale to many Discord Servers and handle over one hundred requests per second.

Create projects on AWS that lower the bar to entry for people wanting to try text to image models.

AWS Architecture Blog