Monitor and scale your Amazon ECS on AWS Fargate application using Prometheus metrics

If you’ve ever run a containerized workload, you know that it can be tricky to check what’s happening in your container. In this blog post, I show how you can monitor and scale your Amazon Elastic Container Service (Amazon ECS) on AWS Fargate application using Prometheus metrics. Although there is more information about Prometheus already available, it can be difficult to get started if you’ve come to containers through Amazon ECS on AWS Fargate. At the end of this post, you’ll be able to use metrics gathered using Prometheus to perform automatic scaling actions on your Amazon ECS on AWS Fargate application.

CloudWatch Container Insights gives you visibility into what is happening inside your container. By using its performance monitoring features, available in the Amazon CloudWatch console, you can see the CPU, memory, network, and bytes read/written of a container. The CloudWatch Container Insights dashboard shows some of the processing that is happening inside your container. You can use this information to performance tune your application or decide when you should scale.

The olden days of monitoring containerized applications

Many applications are not directly CPU- or memory-bound. They have scaling characteristics that are correlated with CPU or memory, but those metrics might not be the best indicator of application performance. For example, in Java applications, JVM memory usage or connection count might be a better indicator of performance. When scaling is required, you need a better indicator than just monitoring available container memory.

What do we do if we need to better monitor our containerized applications? Are we limited to using CPU and memory metrics for all our monitoring or is there a better way? Fortunately, in September 2020, the CloudWatch team announced the general availability of Prometheus metrics from Amazon ECS, Amazon EKS, AWS Fargate, and Kubernetes clusters.

Prometheus is already widely used to monitor Kubernetes workloads. It works by scraping metrics from the sites it is monitoring and sends off those metrics to a database for viewing. It’s common to integrate an exporter with your application, to accept requests from Prometheus and provide the data requested. Typically, an exporter is used for data that you don’t have full control over (like JVM data). Exporters generally listen on a specific port and path for Prometheus requests (like :9404/metrics/).

Set up a Java-based, Prometheus enabled container

The first step to using Prometheus with Amazon ECS on AWS Fargate is to set up a container that is configured to provide metrics to Prometheus. For this demo, I deploy a Java WAR file as the application.

After you’ve found an application to deploy, do the following:

If you don’t already have a WAR file to deploy, download the Tomcat sample application from Apache. This sample is just a vanilla Java application that runs on Tomcat.
Install Docker.
Download an exporter for Java Tomcat, which you can get from the official Maven repo. If you want to look at the source code, it’s available in the official Prometheus GitHub repository.
Define the configuration for this exporter. The jmx_exporter repo has a few example configurations. For this demo, I use their yml example. Download their example and name it config.yaml.

Create a file named setenv.sh with the following content:

export JAVA_OPTS="-javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent-0.14.0.jar=9404:/opt/jmx_exporter/config.yaml $JAVA_OPTS"

Create a new Dockerfile with the following content:

# From Tomcat 9.0 JDK8 OpenJDK
FROM tomcat:9.0-jdk8-openjdk

RUN mkdir -p /opt/jmx_exporter

COPY ./jmx_prometheus_javaagent-0.14.0.jar /opt/jmx_exporter
COPY ./config.yaml /opt/jmx_exporter
COPY ./setenv.sh /usr/local/tomcat/bin
COPY sample.war /usr/local/tomcat/webapps/

RUN chmod o+x /usr/local/tomcat/bin/setenv.sh

ENTRYPOINT ["catalina.sh", "run"]

Now build the Docker container.

Open the Amazon Elastic Container Registry (Amazon ECR) console, and choose Create repository.
Select the repository you created, choose View push commands, and follow the instructions for your operating system. In this demo, I named my container hello-prometheus.

Figure 1: hello-prometheus container in the Amazon ECR console

After you’ve built your container and pushed it to Amazon ECR, under Image URI, choose Copy URI. You need this URI in the following procedure.

Create a Prometheus-enabled AWS Fargate task

Now that you’ve pushed your container, create a task. A task is the core unit of work in AWS Fargate. When you create a Fargate task definition, you define the compute and configuration associated with this unit of work.

In the Amazon ECS console, go to the Task Definition tab, and then choose Create new task definition
On the Select compatibilities page, select FARGATE as the launch type that your task should use and choose Next step.
For Task Definition Name, enter a name for your task definition. I use hello-prometheus-task.
For Task Role, choose an AWS Identity and Access Management (IAM) role that provides permissions for containers in your task to make calls to AWS API operations on your behalf.
For Task execution IAM role, I use the ecsTaskExecutionRole. If you don’t have this role already created, follow the steps in Creating the task execution IAM role.
Because this task doesn’t need much from a memory and CPU perspective, for Task memory (GB), choose 0.5 GB. For Task CPU (vCPU), choose 0.25 vCPU.
To define the container used for this task, choose Add container.
1. Enter a name for the container. I use hello-prometheus.
2. For Image, paste in the image URI you copied after you pushed your container to Amazon ECR.
3. Under Port mappings, enter TCP ports 8080 and 9404 as shown in Figure 2. The application is designed to be viewed by users over port 8080. Prometheus uses port 9404 to scrape metrics. This task responds with the internet website on :8080/sample/ and it responds to Prometheus with metrics on :9404/metrics/.
  Figure 2: Port mappings for the application and Prometheus
4. In Docker Labels, add two key-value pairs, as shown in Figure 3. Prometheus uses these labels (Java_EMF_Metrics with a value of true and ECS_PROMETHEUS_EXPORTER_PORT with a value of 9404) to auto-discover your tasks.
  
  Figure 3: Key-value pairs
5. Choose Add to save the container configuration.
At the bottom of the task page, choose Create to create the task.

Deploy the task in AWS Fargate

You can deploy this AWS Fargate task the same way you’d deploy any other Fargate task. If you already know how to do that, skip to the next section.

To complete this step, you need a VPC and an Application Load Balancer. To make building the infrastructure easy, you can deploy this CloudFormation template, which creates an Amazon Virtual Private Cloud (Amazon VPC) with two public subnets, an Application Load Balancer, and a security group for the Application Load Balancer and your Fargate task.

To deploy the template:

Copy the CloudFormation template file listed below, and save it to your computer. I named the template setup-networking.yml.
From a command prompt, type:
```
aws cloudformation deploy --template-file setup-networking.yml \
--stack-name hello-prometheus
```
Running this command requires the AWS CLI to be configured and IAM permissions sufficient to launch a CloudFormation stack.
Wait for the stack to be deployed, which should take less than 5 minutes.

setup-networking.yml

AWSTemplateFormatVersion: "2010-09-09"
Description: This template sets up a simple VPC with 2 public subnets, ingress on tcp port 80, and an Application Load Balancer

Outputs:
  VPCId:
    Value: !Ref VPC
  PublicSubnet1:
    Value: !Ref Subnet1
  PublicSubnet2:
    Value: !Ref Subnet2
  PublicSecurityGroup:
    Value: !Ref PublicSecurityGroup

Resources:
  VPC:
    Type: "AWS::EC2::VPC"
    Properties:
      CidrBlock: "192.168.0.0/16"
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
  IG:
    Type: "AWS::EC2::InternetGateway"
    Properties:
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
  IGAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref IG
      VpcId: !Ref VPC
  IGVPCRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
  IGVPCRoute:
    Type: AWS::EC2::Route
    Properties:
      DestinationCidrBlock: "0.0.0.0/0"
      RouteTableId: !Ref IGVPCRouteTable
      GatewayId: !Ref IG
  Subnet1ToRouteTable:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref IGVPCRouteTable
      SubnetId: !Ref Subnet1
  Subnet2ToRouteTable:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref IGVPCRouteTable
      SubnetId: !Ref Subnet2
  Subnet1:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: "192.168.1.0/26"
      AvailabilityZone: !Select [0, !GetAZs '' ]
      MapPublicIpOnLaunch: true
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
      - Key: SubnetType
        Value: Public Subnet
      VpcId: !Ref VPC
  Subnet2:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: "192.168.2.0/26"
      AvailabilityZone: !Select [1, !GetAZs '' ]
      MapPublicIpOnLaunch: true
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
      - Key: SubnetType
        Value: Public Subnet
      VpcId: !Ref VPC
  PublicSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      VpcId: !Ref VPC
      GroupName: !Sub "${AWS::StackName}-public-sg"
      GroupDescription: Public security group
      SecurityGroupIngress:
        - IpProtocol: tcp
          ToPort: 80
          FromPort: 80
          CidrIp: "0.0.0.0/0"
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"
  PublicSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref PublicSecurityGroup
      IpProtocol: tcp
      FromPort: 0
      ToPort: 65535
      SourceSecurityGroupId: !Ref PublicSecurityGroup
  LoadBalancer:
    Type : "AWS::ElasticLoadBalancingV2::LoadBalancer"
    Properties :
      Name : !Sub "${AWS::StackName}-alb"
      SecurityGroups :
      - !Ref PublicSecurityGroup
      Subnets :
      - !Ref Subnet1
      - !Ref Subnet2
      Tags:
      - Key: Name
        Value: !Ref "AWS::StackName"

Now that the infrastructure has been built, deploy your task!

First, create an ECS cluster. In ECS, a cluster is simply a mechanism to logically group tasks or services.

In the Amazon ECS console, choose Clusters, and then choose Create Cluster.
Choose Networking only to create an AWS Fargate cluster, and then choose Next step.
Enter a name for the cluster. I use hello-prometheus-cluster.
Choose Create to create the cluster.

Now, deploy your AWS Fargate task to the new cluster. You can deploy an AWS Fargate task by creating an Amazon ECS service. It’s the mechanism for maintaining and running task definitions simultaneously in an ECS cluster.

On the Clusters page, choose Services, and then choose Create.
For the launch type, choose FARGATE.
From Task definition, choose hello-prometheus-task. The revision should be updated to the latest version, which is 1 (latest).
For Service name, I use hello-prometheus-service. For the number of tasks, I enter 2 to ensure that I’ll always have two replicas of this task running, and then choose Next step.
In the Networking section, choose the VPC created by the CloudFormation template. For Subnets, choose the public subnets. For Security group, choose Edit, and then choose the appropriate security group (in my case, hello-prometheus-public-sg). Choose Save.
Under Load balancing, for Load balancer type, choose Application Load Balancer.
1. Choose Use an existing load balancer, and then choose your Application Load Balancer (in my case, hello-prometheus-alb).
2. Because the container is designed to serve public traffic over port 8080, under the container name: port dropdown list, choose 8080, and then choose Add to load balancer.
3. For Listener, enter 80.
4. For Target group name, enter hello-prometheus-target.
5. For Health check path, enter /sample/. (Be sure to include the trailing slash.)
6. Choose Next step, choose Next step again, and then choose Create service.

You have now deployed a Prometheus-enabled application on AWS Fargate. After the containers enter a RUNNING state and pass health checks, you can view the website. To get the application’s DNS name, go to the Amazon Elastic Compute Cloud (Amazon EC2) console, choose your load balancer, and then view its details on the Description tab.

The Description tab displays details for the Application Load Balancer, including ARN, DNS name, state, type, IP address type, VPC, Availability Zones, and more.

Figure 4: Description tab displays details, including DNS name

To view the new application, go to /sample/ at the DNS address of your Application Load Balancer. You should see the following webpage:

The webpage for the “Hello, World” application says, “This is the home page for a sample application used to illustrate the source directory organization of a web application utilizing the principles in the Application Developer’s Guide.

Figure 5: Tomcat “Hello, World” webpage

View Prometheus metrics in CloudWatch

To this point, you’ve deployed a Prometheus-enabled Fargate task, but the data is not yet being ingested into CloudWatch. Remember that the Prometheus-enabled Fargate task is producing metrics that can be viewed on port 9404 from the /metrics path. Set up a new Fargate task that scrapes these metrics from that path and sends them to CloudWatch. This is easy to do with the CloudWatch agent with Prometheus support.

The CloudWatch agent with Prometheus needs two configuration files to work correctly. The first configuration file is the standard Prometheus configuration defined in the Prometheus documentation. The second configuration is for the CloudWatch agent. For more information about these files, see Scraping additional Prometheus sources and importing those metrics in the Amazon CloudWatch User Guide.

You can use the CloudFormation template in AWS samples for Amazon CloudWatch Container Insights as a starting point to set up and configure the CloudWatch agent. I edited the CWAgentConfigSSMParameter configuration in this template to control the application metrics being ingested into CloudWatch. Copy the following CloudFormation template and save it as install-prometheus-collector.yaml.

install-prometheus-collector.yaml

AWSTemplateFormatVersion: "2010-09-09"
Description: This template sets up the configuration and tasks necessary to collect Prometheus metrics via the CloudWatch agent for Prometheus 

Parameters:
  ECSClusterName:
    Type: String
    Description: Enter the name of your ECS cluster from which you want to collect Prometheus metrics
  CreateIAMRoles:
    Type: String
    AllowedValues:
      - 'True'
      - 'False'
    Description: Whether to create new IAM roles or using existing IAM roles for the ECS tasks
    ConstraintDescription: must specified, either True or False
  ECSLaunchType:
    Type: String
    AllowedValues:
      - 'EC2'
      - 'FARGATE'
    Description: ECS launch type for the ECS cluster
    ConstraintDescription: must specified, either EC2 or FARGATE
  TaskRoleName:
    Type: String
    Description: Enter the CloudWatch agent ECS task role name
  ExecutionRoleName:
    Type: String
    Description: Enter the CloudWatch agent ECS execution role name
  SecurityGroupID:
    Type: String
    Description: Enter the security group ID for running the CloudWatch agent ECS task
  SubnetID:
    Type: String
    Description: Enter the subnet ID for running the CloudWatch agent ECS task
Conditions:
  CreateRoles:
    !Equals [!Ref CreateIAMRoles, 'True']
  AssignPublicIp:
    !Equals [!Ref ECSLaunchType, 'FARGATE']
Resources:
  PrometheusConfigSSMParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Sub 'AmazonCloudWatch-PrometheusConfigName-${ECSClusterName}-${ECSLaunchType}-awsvpc'
      Type: String
      Tier: Standard
      Description: !Sub 'Prometheus Scraping SSM Parameter for ECS Cluster: ${ECSClusterName}'
      Value: |-
        global:
          scrape_interval: 1m
          scrape_timeout: 10s
        scrape_configs:
          - job_name: cwagent-ecs-file-sd-config
            sample_limit: 10000
            file_sd_configs:
              - files: [ "/tmp/cwagent_ecs_auto_sd.yaml" ]
  CWAgentConfigSSMParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: !Sub 'AmazonCloudWatch-CWAgentConfig-${ECSClusterName}-${ECSLaunchType}-awsvpc'
      Type: String
      Tier: Intelligent-Tiering
      Description: !Sub 'CWAgent SSM Parameter with App Mesh and Java EMF Definition for ECS Cluster: ${ECSClusterName}'
      Value: |-
        {
          "agent": {
            "debug": true
          },
          "logs": {
            "metrics_collected": {
              "prometheus": {
                "prometheus_config_path": "env:PROMETHEUS_CONFIG_CONTENT",
                "ecs_service_discovery": {
                  "sd_frequency": "1m",
                  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
                  "docker_label": {
                    "sd_port_label": "ECS_PROMETHEUS_EXPORTER_PORT",
                    "sd_job_name_label": "ECS_PROMETHEUS_JOB_NAME"
                  },
                  "task_definition_list": [
                    {
                      "sd_job_name": "bugbash-workload-java-ec2-awsvpc-task-def-sd",
                      "sd_metrics_ports": "9404;9406",
                      "sd_task_definition_arn_pattern": ".*:task-definition/hello-prometheus*"
                    }
                  ]
                },
                "emf_processor": {
                  "metric_declaration_dedup": true,
                  "metric_declaration": [
                    {
                      "source_labels": ["Java_EMF_Metrics"],
                      "label_matcher": "^true$",
                      "dimensions": [["ClusterName","TaskDefinitionFamily"]],
                      "metric_selectors": [
                        "^jvm_threads_(current|daemon)$",
                        "^jvm_classes_loaded$",
                        "^java_lang_operatingsystem_(freephysicalmemorysize|totalphysicalmemorysize|freeswapspacesize|totalswapspacesize|systemcpuload|processcpuload|availableprocessors|openfiledescriptorcount)$",
                        "^catalina_manager_(rejectedsessions|activesessions)$",
                        "^jvm_gc_collection_seconds_(count|sum)$",
                        "^catalina_globalrequestprocessor_(bytesreceived|bytessent|requestcount|errorcount|processingtime)$"
                      ]
                    },
                    {
                      "source_labels": ["Java_EMF_Metrics"],
                      "label_matcher": "^true$",
                      "dimensions": [["ClusterName","TaskDefinitionFamily","pool"]],
                      "metric_selectors": [
                        "^jvm_memory_pool_bytes_used$"
                      ]
                    },
                    {
                      "source_labels": ["Java_EMF_Metrics"],
                      "label_matcher": "^true$",
                      "dimensions": [["ClusterName","TaskDefinitionFamily","port","protocol"]],
                      "metric_selectors": [
                        "^tomcat_threadpool_connectioncount$"
                      ]
                    }
                  ]
                }
              }
            },
            "force_flush_interval": 5
          }
        }
  CWAgentECSExecutionRole:
    Type: AWS::IAM::Role
    Condition: CreateRoles
    Properties:
      RoleName: !Sub ${ExecutionRoleName}-${AWS::Region}
      Description: Allows ECS container agent to make calls to the Amazon ECS API on your behalf.
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
        - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
      Policies:
        - PolicyName: ECSSSMInlinePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ssm:GetParameters
                Resource: 
                  !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/AmazonCloudWatch-*'
  CWAgentECSTaskRole:
    Type: AWS::IAM::Role
    Condition: CreateRoles
    DependsOn: CWAgentECSExecutionRole
    Properties:
      RoleName: !Sub ${TaskRoleName}-${AWS::Region}
      Description: Allows ECS tasks to call AWS services on your behalf.
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
      Policies:
        - PolicyName: ECSServiceDiscoveryInlinePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ecs:DescribeTasks
                  - ecs:ListTasks
                  - ecs:DescribeContainerInstances
                Resource: "*"
                Condition:
                  ArnEquals:
                    ecs:cluster:
                      !Sub 'arn:${AWS::Partition}:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${ECSClusterName}'
              - Effect: Allow
                Action:
                  - ec2:DescribeInstances
                  - ecs:DescribeTaskDefinition
                Resource: "*"
  ECSCWAgentTaskDefinition:
    Type: 'AWS::ECS::TaskDefinition'
    DependsOn:
      - PrometheusConfigSSMParameter
      - CWAgentConfigSSMParameter
    Properties:
      Family: !Sub 'cwagent-prometheus-${ECSClusterName}-${ECSLaunchType}-awsvpc'
      TaskRoleArn: !If [CreateRoles, !GetAtt CWAgentECSTaskRole.Arn, !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${TaskRoleName}']
      ExecutionRoleArn: !If [CreateRoles, !GetAtt CWAgentECSExecutionRole.Arn, !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${ExecutionRoleName}']
      NetworkMode: awsvpc
      ContainerDefinitions:
        - Name: cloudwatch-agent-prometheus
          Image: amazon/cloudwatch-agent:1.248913.0-prometheus
          Essential: true
          MountPoints: []
          PortMappings: []
          Environment: []
          Secrets:
            - Name: PROMETHEUS_CONFIG_CONTENT
              ValueFrom: !Sub 'AmazonCloudWatch-PrometheusConfigName-${ECSClusterName}-${ECSLaunchType}-awsvpc'
            - Name: CW_CONFIG_CONTENT
              ValueFrom: !Sub 'AmazonCloudWatch-CWAgentConfig-${ECSClusterName}-${ECSLaunchType}-awsvpc'
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-create-group: 'True'
              awslogs-group: "/ecs/ecs-cwagent-prometheus"
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: !Sub 'ecs-${ECSLaunchType}-awsvpc'
      RequiresCompatibilities:
        - !Ref ECSLaunchType
      Cpu: '512'
      Memory: '1024'
  ECSCWAgentService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref ECSClusterName
      DesiredCount: 1
      LaunchType: !Ref ECSLaunchType
      SchedulingStrategy: REPLICA
      ServiceName: !Sub 'cwagent-prometheus-replica-service-${ECSLaunchType}-awsvpc'
      TaskDefinition: !Ref ECSCWAgentTaskDefinition
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: !If [AssignPublicIp, ENABLED, DISABLED]
          SecurityGroups:
            - !Ref SecurityGroupID
          Subnets:
            - !Ref SubnetID

Copy the prometheus-install.sh script below to a file. Edit the first four lines of the script to correspond to your environment.

prometheus-install.sh

export AWS_DEFAULT_REGION=us-east-1
export ECS_CLUSTER_NAME=hello-prometheus-cluster
export ECS_CLUSTER_SECURITY_GROUP=sg-0000000000000
export ECS_CLUSTER_SUBNET=subnet-00000000000000000
export ECS_LAUNCH_TYPE=FARGATE
export CREATE_IAM_ROLES=True

aws cloudformation deploy --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-${ECS_LAUNCH_TYPE}-awsvpc \
    --template-file install-prometheus-collector.yaml \
    --parameter-overrides ECSClusterName=${ECS_CLUSTER_NAME} \
                CreateIAMRoles=${CREATE_IAM_ROLES} \
                ECSLaunchType=${ECS_LAUNCH_TYPE} \
                SecurityGroupID=${ECS_CLUSTER_SECURITY_GROUP} \
                SubnetID=${ECS_CLUSTER_SUBNET} \
                TaskRoleName=Prometheus-TaskRole-${ECS_CLUSTER_NAME} \
                ExecutionRoleName=Prometheus-ExecutionRole-${ECS_CLUSTER_NAME} \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${AWS_DEFAULT_REGION}

AWS_DEFAULT_REGION is the AWS Region where your application is running (for example, us-east-1). ECS_CLUSTER_NAME is the name of the ECS cluster you created for this demo (in my case, hello-prometheus-cluster). ECS_CLUSTER_SECURITY_GROUP is the security group ID associated with the AWS Fargate tasks. If you used the CloudFormation template to build your VPC, you can get the name of the security group by running this command:

aws cloudformation describe-stacks --stack-name hello-prometheus \
--query "Stacks[0].Outputs[?OutputKey=='PublicSecurityGroup'].OutputValue" \
--output text

You can get the name of a public subnet in your VPC (the ECS_CLUSTER_SUBNET value) by running this command:

aws cloudformation describe-stacks --stack-name hello-prometheus \
--query "Stacks[0].Outputs[?OutputKey=='PublicSubnet1'].OutputValue" \
--output text

Save the file and run it from the command line. You might need to run chmod +x on the file to make it executable. In my case, I’ve named this script prometheus-install.sh.

./prometheus-install.sh

This script does the following:

Creates a new Fargate task using the CloudWatch agent with Prometheus.
Adds the task to a service in the ECS cluster you created.
Sets up the appropriate IAM roles for this task to work properly.

This script also creates two values in AWS Systems Manager Parameter Store:

The value that starts with AmazonCloudWatch-CWAgentConfig- contains the configuration for the CloudWatch agent.
The value that starts with AmazonCloudWatch-PrometheusConfigName- contains the Prometheus scraping configuration.

If you review the configurations in Parameter Store, you’ll find a few references to the Docker labels that were added to the Fargate task definition. The Prometheus scraper uses those Docker labels to detect and pick up your application—or even new applications you add—and starts sending the metrics it scrapes from these applications to CloudWatch.

After the CloudFormation stack is created, you should see a new task running in your cluster. This task uses the configuration information from the two Parameter Store fields to determine which metrics to scrape from hello-prometheus-task and how often.

hello-prometheus-cluster page displays three tasks in the RUNNING state.

Figure 6: Three tasks running in the Fargate cluster

After a few minutes, new metrics appear in CloudWatch under the ECS/ContainerInsights/Prometheus namespace. These metrics have been gathered from your application and delivered to CloudWatch through Prometheus. These metrics are like any other metrics you’ll find in the CloudWatch console.

If you look at the metrics gathered from Prometheus, you’ll find that I am collecting JVM information, which gives me better observability into my Java application. I can use this information to make better decisions about the health of the application.

Use metrics collected from Prometheus for automatic scaling

I want to scale my application based on connection count. As the number of connections to my application increase, I want to scale out. As the number of connections drop, the application should scale back in. It’s simple to use a Prometheus-gathered metric for Fargate scaling actions.

First, choose the metric you want to scale on. I’m going to use the tomcat_threadpool_connectioncount metric, which is in CloudWatch under ECS/ContainerInsights/Prometheus > [ClusterName, TaskDefinitionFamily, port. protocol]. I want to see how the number of Fargate tasks increase as the scaling rule is applied. To find the number of on-demand Fargate resources, open the CloudWatch console, and then choose Metrics. Under Usage, choose By AWS Resource. Select the ResourceCount metric associated with the Fargate Service and OnDemand Resource.

On the Graphed metrics tab, the tomcat_threadpool_connectioncount and Fargate Task Instances are selected.

Figure 7: tomcat_threadpool_connectioncount metric

Fargate scales the number of running tasks based on alarms that fire. Again, for this demo, the application should scale out as the number of connections to the application increases and scale in as the number of connections to the application drops. To set up these alarms, choose the alarm icon (bell) next to the tomcat_threadpool_connectioncount metric.

To create the alarm that can be used by Fargate to perform scale-out activities, begin by navigating to the Alarms page of the CloudWatch console:

On the Alarms page of the CloudWatch console, use all defaults. Select Greater. Under than, enter 150. Choose Next.
Choose Create a new topic and enter a name for the topic. I use hello-prometheus-scaleup. In Email endpoints that will receive the notification, enter your email address. Choose Create topic, and then choose Next.
In the Alarm name field enter a name for the alarm. I use hello-prometheus-scaleup. Choose Next, and then Create alarm.

Now create the alarm that can be used by Fargate for scale-in activities by navigating to the Alarms page of the CloudWatch console:

On the Alarms page of the CloudWatch console, choose Create alarm.
Choose Select metric, and then choose ECS/ContainerInsights/Prometheus > [ClusterName, TaskDefinitionFamily, port, protocol].
Select tomcat_threadpool_connectioncount, and then choose Select metric.
Use the defaults. Select Lower. Under than, enter 100. Choose Next.
Choose Create a new topic, and then enter a topic name. I use hello-prometheus-scaledown. In the Email endpoints that will receive the notification, enter your email address.
Choose Create topic, and then choose Next.
In the Alarm name field enter a name for the alarm. I use hello-prometheus-scaledown.
Choose Next, and then choose Create alarm.

After a few minutes, both alarms should move from “insufficient data” and enter either an In alarm or an OK state.

On the Alarms page, the state of hello-prometheus-scaledown is In alarm. The state of hello-prometheus-scaleup is OK.

Figure 8: Alarms page in the CloudWatch console

The hello-prometheus-scaledown alarm is in the In alarm state because there is no traffic on the site currently. Now configure AWS Fargate to scale in whenever this alarm fires, until the minimum number of tasks that should be running is reached. When enough traffic hits the site, the hello-prometheus-scaleup alarm should fire, which causes AWS Fargate to scale out the tasks, until the maximum number of tasks that should be running is reached.

To define the AWS Fargate scale-out policy, begin by navigating to the Amazon ECS console:

In the Amazon ECS console, choose the AWS Fargate service that you want to scale, and then choose Update. My AWS Fargate service is hello-prometheus-service.
Choose Next step, and then choose Next step
On the Set Auto Scaling (optional) page, choose Configure Service Auto Scaling to adjust to your service’s desired count.
For Minimum number of tasks, enter 2.
For Maximum number of tasks, enter 10.
Choose Add scaling policy.
On the Add policy page, change Scaling policy type to Step scaling.
For Policy name, I use hello-prometheus-scaleup-policy.
Choose Use an existing alarm, and then choose the scale-out alarm you just created. My alarm is hello-prometheus-scaleup.
Under Scaling action, choose Add, and then enter 1.
Choose Save.

You have now defined the policy that tells AWS Fargate how to scale out. Repeat the process to define the policy that tells AWS Fargate how to scale in.

Choose Add scaling policy.
On the Add policy page, choose Step scaling.
For Policy name, I use hello-prometheus-scaledown-policy.
Choose Use an existing alarm, and then choose the scale-in alarm you created. My alarm is hello-prometheus-scaledown.
Under Scaling action, choose Remove, and then enter 1.
Choose Save.
Choose Next step, and then choose Update Service.

The cluster is now updated to auto scale based on the custom metrics gathered through Prometheus! After sending traffic to your application, you can see your scale-out and scale-in alarms in action. The blue line on the following chart is the number of connections to the application. The orange line is the number of running tasks. You can see that as traffic increased, the application scaled out, and as traffic decreased, the application slowly scaled itself back in.

The graph shows lines for the tomcat_threadpool_connection metric and Fargate Task Instances.

Figure 9: Fargate scaling graph

Cleanup

To avoid ongoing charges to your AWS account, remove the resources you created.

Delete the scale-out and scale-in CloudWatch alarms.
Delete all AWS Fargate services and running tasks in the ECS cluster.
Use the AWS CloudFormation console to delete the two stacks created as part of this demo. Choose each stack, choose Delete, and then choose Delete stack.

Cost considerations

With Amazon CloudWatch Container Insights, you pay only for what you use. For a good example of costs, see the Amazon CloudWatch pricing page. You’ll find pricing for CloudWatch alarms on the Alarms tab.

AWS Fargate pricing is calculated based on the vCPU and memory resources used from the time you start to download your container image until the AWS Fargate task shuts down, rounded up to the nearest second. For current pricing and supported configurations, see the AWS Fargate pricing page.

Although this demo uses a VPC created in Amazon VPC, it does not use Amazon VPC features that incur a cost. For current pricing, see the Amazon VPC pricing page.

You are charged for each hour or partial hour that an Application Load Balancer is running and the number of Load Balancer Capacity Units (LCU) used per hour. An LCU consists of new connections, active connections, processed bytes, and the number of rules processed by the load balancer. For current pricing and examples, see the Elastic Load Balancing pricing page.

Conclusion

In this blog post, I showed you how using Prometheus metrics makes it possible to get better insight into container workloads. I showed you how to:

Set up and deploy a Prometheus-enabled Amazon ECS for AWS Fargate task.
Deliver metrics from the task to CloudWatch.
Use a custom metric gathered through Prometheus to scale out and scale in an Amazon ECS for AWS Fargate application.

Using the CloudWatch agent with Prometheus, you have a powerful new way to ingest performance metrics from workloads into Amazon CloudWatch. Application metrics gathered through the CloudWatch agent with Prometheus provides better visibility into container performance and application health. You can use what you’ve learned in this blog post to gather custom metrics via Prometheus, and you can use those metrics to configure auto scaling on your Amazon ECS for AWS Fargate applications.

For more information, read about how to set up and configure Prometheus metrics collection on Amazon ECS clusters.

About the Author

Mike is a Senior Solutions Architect based out of Salt Lake City, Utah. He enjoys helping customers solve their technology problems. His interests include software engineering, security, and AI/ML.

AWS Cloud Operations & Migrations Blog