Containers

Build Generative AI apps on Amazon ECS for SageMaker JumpStart

Introduction

The rise in popularity of Generative AI (GenAI) reflects a broader shift toward intelligent automation in the business landscape, which enables enterprises to innovate at an unprecedented scale, while adhering to dynamic market demands. While the promise of GenAI is exciting, the initial steps toward its adoption can be overwhelming. This post aims to demystify the complexities and offer guidance for getting started.

Amazon SageMaker Jumpstart provides you a convenient option to start your GenAI journey on AWS. It offers foundation models such as Stable Diffusion, FLAN-T5, and LLaMa-2, which are pretrained on massive amounts of data. Foundation models can be adapted to a wide variety of workloads across different domains such as content creation, text summarization etc. Amazon SageMaker Studio provides managed Jupyter notebooks, an interactive web-based interface to running live code and data analysis. Furthermore, you can fine-tune and deploy foundation models to Amazon SageMaker Endpoints for inference from SageMaker Studio.

However, business users who are responsible for verifying the effectiveness of foundation models may not be familiar with Jupyter or writing code. It is easier for business users to access foundation models in the context of an application. This is where Streamlit shines. Streamlit is an open-source Python library that allows data scientists and engineers to easily build and deploy web applications for machine learning and data science projects with minimal coding. The web-based user interface makes it ideal for business users to interact with. With Streamlit applications, business users can easily explore or verify the foundation models and collaborate effectively with data science teams.

Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestrator, which makes it easy to run containerized applications in a scalable and secure manner. AWS Fargate is a serverless compute engine for containers. It can simplify the management and scaling of cloud applications by shifting undifferentiated operational tasks to AWS. With Amazon ECS and AWS Fargate, you can alleviate operational burdens, empowering you to concentrate on innovation and swiftly develop GenAI applications with Streamlit. Additionally, by setting up a Continuous Integration/Continuous Delivery (CI/CD) mechanism through AWS CodePipeline, you can efficiently iterate on the feedback. In this post, we’ll discuss how you can build a GenAI application with Amazon ECS, AWS Fargate, Amazon SageMaker JumpStart, and AWS CodePipeline.

Solution overview

Solution architecture for GenAI app on Fargate accessing SageMaker Foundation Model

Figure 1. Architectural diagram showing Amazon ECS task with Streamlit app accessing Amazon SageMaker endpoint for a Foundation Model

Figure 1 shows the architecture of an Amazon ECS cluster with tasks for a GenAI application with Streamlit. The application can be accessed using the AWS Application Load Balancer, which is associated with an Amazon ECS Service. The Amazon ECS services ensures that a required number of tasks are always running. You can additionally configure the Amazon ECS Service to auto scale your tasks as the load increases. You can share the Domain Name System (DNS) address of the Load Balancer with your business users as is for utilizing the model. Alternately, you may use a custom DNS name for the application using Amazon Route 53 or your preferred DNS service.

Normally, your Amazon SageMaker endpoints and your Amazon ECS cluster with the Streamlit application, live in the same AWS account. This allows you to have a setup that is self-contained for your GenAI model access, fine tuning, and inference testing. However, if your Amazon SageMaker endpoint must be in a different AWS account, then you can leverage Amazon API Gateway to allow external access to the Amazon SageMaker inference endpoint from a client outside your AWS account. You may refer to this linked post for more information. This example assumes your Amazon SageMaker endpoints will be in the same AWS account as your Amazon ECS cluster.

Your Amazon ECS task must have access required to invoke the Amazon SageMaker endpoints for inference. You can further restrict the AWS Identity and Access Management (AWS IAM) policies in the Amazon ECS task IAM role to specific Amazon Resource Names (ARNs) for your Amazon SageMaker endpoints. By linking the AWS IAM policy to a specific ARN, you can ensure that the policy only allows access when the request is made to that specific endpoint. This helps you follow the principle of the least privilege for security. Your AWS Fargate task also needs access to read the Amazon SageMaker endpoints from AWS Systems Manager Parameter Store. Using Parameter Store allows your Amazon SageMaker endpoint addresses to be decoupled from your application.

The solution also includes a continuous deployment setup. AWS CodePipeline can detect any changes to your application, triggering AWS CodeBuild to build a new container image, which is pushed to Amazon Elastic Container Registry (Amazon ECR). The pipeline modifies the Amazon ECS task definition with the new container image version and updates the Amazon ECS service to replace the tasks with the new version of your application.

Walkthrough

You can follow these steps to configure a GenAI serving application with Amazon SageMaker Jumpstart and AWS Fargate:

  1. Configure the prerequisites
  2. Clone and set up the AWS Cloud Deployment Kit (CDK) application
  3. Deploy the Amazon SageMaker environment
  4. Deploy the CI/CD environment
  5. Explore the image generation AI model
  6. Explore the text generation AI model

Prerequisites

Clone and set up the GitHub repository for GenAI application

Configure the AWS credentials in the host you are using for your setup. To start, fork the Amazon ECS Blueprints GitHub Repository and clone it to your local Git repository.

git clone https://github.com/<repository_owner>/ecs-blueprints.git
cd ecs-blueprints/cdk/examples/generative_ai_service/

Setup AWS Account and AWS Region environment variables to match your environment. This post uses the Oregon Region (us-west-2) for the example. You’ll generate a .env file to be used by the AWS CDK template. You’ll fetch variables in the environment file during deploying backend service.

export AWS_ACCOUNT=$(aws sts get-caller-identity --query 'Account' --output text)
export AWS_REGION=${AWS_REGION:=us-west-2}

sed -e "s/<ACCOUNT_NUMBER>/$AWS_ACCOUNT/g" \
  -e "s/<REGION>/$AWS_REGION/g" sample.env > .env

You can create a Python virtual environment to isolate Python installs and associated pip packages from your local environment. After this, you’ll install the required packages:

# manually create a virtualenv: 
python3 -m venv .venv

# activate your virtualenv:
source .venv/bin/activate

# install the required dependencies: 
python -m pip install -r requirements.txt

If you have previously not used CDK in your AWS environment, which is a combination of an AWS Account and AWS Region, you must run the bootstrap command:

cdk bootstrap aws://${AWS_ACCOUNT}/${AWS_REGION}

List the stacks in the application. In this Amazon ECS Blueprint, you’ll see four stacks.

cdk ls

Deploy the Amazon SageMaker environment

After you have the above setup in place, you are now ready to create the solution components. First, you’ll create the Amazon SageMaker environment and SageMaker inference endpoint with the GenAITxt2ImgSageMakerStack AWS CDK stack.

cdk deploy GenAITxt2ImgSageMakerStack --require-approval never

Once the stack deployment is complete, deploy the Amazon SageMaker environment for the text to text generation model with the GenAITxt2TxtSageMakerStack AWS CDK stack.

cdk deploy GenAITxt2TxtSageMakerStack --require-approval never

The text to image example makes use of Stability AI’s Stable Diffusion v2.1 base foundation model. The text to text example makes use of Hugging Face FLAN-T5-XL foundation model. Both foundation models use ml.g4dn.2xlarge instances in Amazon SageMaker to generate inference endpoints. This is configured as default settings in the .env configuration. You can modify the .env values allow to use alternative models and inference instance type.

Deploy the CI/CD environment

Next, you’ll establish the CI/CD environment for easy updates to your running application. The CI/CD stack makes use of AWS CodePipeline as the release pipeline. It pulls the updated source code from your GitHub repository and uses AWS CodeBuild to build the new version of the container image for your application. The new version of the container image is used to update the running application in Amazon ECS.

Change the working directory to cicd_service to create CI/CD pipeline.

cd ../cicd_service

Create a GitHub token to access the forked repository. You must create this in the same region where the Gen AI services are deployed.

aws secretsmanager create-secret --name ecs-github-token --secret-string <your-github-access-token>

As before, setup AWS Account and AWS Region environment variables to match your environment.

export AWS_ACCOUNT=$(aws sts get-caller-identity --query 'Account' --output text)
export AWS_REGION=${AWS_REGION:=us-west-2}

sed -e "s/<ACCOUNT_NUMBER>/$AWS_ACCOUNT/g" \
  -e "s/<REGION>/$AWS_REGION/g" sample.env > .env

In the .env file, you’ll update some environment variables.

  • Essential Props
    • repository_owner: Github Repository owner (use your GitHub username here)
  • CICD Service Props
    • ecr_repository_name: generative-ai-service
    • container_name: web-container
    • task_cpu: 2048
    • task_memory: 4096
    • service_name: gen-ai-web-service-new
  • Repository props
    • folder_path: ./cdk/examples/generative_ai_service/web-app/.

The resulting env file should look like this:

deploy_core_stack="True"

# Essential Props
account_number="${AWS_ACCOUNT}"
aws_region="${AWS_REGION}"
repository_owner="<REPO_OWNER>"

# Core Stack Props
vpc_cidr="10.0.0.0/16"
ecs_cluster_name="ecs-blueprint-infra"
namespaces="default"
enable_nat_gw="True"
az_count="3"

# CICD Service Props
buildspec_path="./application-code/ecsdemo-cicd/buildspec.yml"
ecr_repository_name="generative-ai-service"
container_image="nginx"
container_name="web-container"
container_port="80"
task_cpu="2048"
task_memory="4096"
desired_count="3"
service_name="gen-ai-web-service-new"

## Repository props
folder_path="./cdk/examples/generative_ai_service/web-app/."
repository_name="ecs-blueprints"
repository_branch="main"
github_token_secret_name="ecs-github-token"

# ECS cluster Props
ecs_task_execution_role_arn="<TASK-EXECUTION-ROLE-ARN>"
vpc_name="ecs-blueprint-infra-vpc"

# Service discovery Props
namespace_name="default.ecs-blueprint-infra.local"
namespace_arn="<NAMESPACE-ARN>"
namespace_id="<NAMESPACE-ID>"

As our web application requires permissions “ssm:GetParameter” and “sagemaker:InvokeEndpoint” to infer the foundation models using the Amazon SageMaker Endpoint, we must add following code to lib/cicd_service_stack.py as well.

Add the imports of these Python modules:

from aws_cdk.aws_iam import (
    Role,
    PolicyStatement,
    Effect
)

Also, add the below code block following the line which defines the Amazon ECS service in cicd_service_stack.py. This code adds the required permissions to the Amazon ECS Task AWS IAM Role.

# Add ECS Task IAM Role
self.fargate_service.task_definition.add_to_task_role_policy(PolicyStatement(
    effect=Effect.ALLOW,
    actions = ["ssm:GetParameter"],
    resources = ["*"],
    )
)

self.fargate_service.task_definition.add_to_task_role_policy(PolicyStatement(
    effect=Effect.ALLOW,
    actions=["sagemaker:InvokeEndpoint"],
    resources=["*"]
    )
)

In the last step, you’ll deploy the core infrastructure which includes Amazon Virtual Private Cloud (Amazon VPC), required AWS IAM policies and roles, Amazon ECS cluster, and GenAI serving ECS service, which will host your Streamlit application.

cdk deploy CoreInfraStack, CICDService --require-approval never

Explore the image generation foundation model

You can use the Application Load Balancer URL from the AWS CDK output to access the load balanced service. Select image generation model in the sidebar on the left side. When you input image description, it generates an image based on the text written in input image description section.

Streamlit app showing Stability AI output

Explore the text generation foundation model

Next, select text generation model in the sidebar. You can input context, provide a relevant prompt, and push generate response button. This generates text response for your prompt in the input query section.

Streamlit app showing Flan T5 output

Cleaning up

You can delete the solution stack either from the AWS CloudFormation console or use the AWS CDK destroy command from the directories where you deployed your CDK stacks. This step is important to stop incurring costs after you explore the foundation models. In production, you could either leave your inference endpoints active for continuous inference. You could also periodically schedule deletion and recreation of the inference endpoints, based on your inference needs.

cdk destroy --all --force

Conclusion

In this post, we showed you how you can use Amazon ECS with AWS Fargate to deploy GenAI applications. With AWS Fargate, you can deploy your apps without the overhead of managing your compute. You learned how Streamlit applications can be configured to access Generative AI foundation models on Amazon SageMaker Jumpstart. Foundation models provide a starting point to help build your own generative AI solutions. With serverless containers, your data science team can focus more on effective solutions for your use cases and less on the underlying infrastructure. Your business users can collaborate with data science teams using the user-friendly web interface of Streamlit apps and provide feedback. This can help your organization be more agile in adopting generative AI for your use cases. The resources referenced below provide you more information about the topics we discussed in this post.

Further reading

SageMaker Jumpstart foundation models

Amazon ECS Blueprints

Amazon ECS best practices

Generative AI with Serverless Workshop

Sushanth Mangalore

Sushanth Mangalore

Sushanth Mangalore is a Sr. Solutions Architect at Amazon Web Services, based in Chicago, IL. He is a technologist helping businesses build solutions on AWS to achieve their strategic objectives. He specializes in container services like Amazon ECS and Amazon EKS. Prior to AWS, Sushanth built IT solutions for multiple industries, using a wide range of technologies.

Jooyoung Kim

Jooyoung Kim

Jooyoung is Containers Specialist Solutions Architect at AWS working on the go-to-market strategy for AWS Container Services to help customers’ AWS journeys. During in her spare time, she enjoys cooking, and swimming somewhere nearby in San Francisco. You can connect with her on LinkedIn at linkedin.com/in/joozero/.