Observing Kubernetes workloads on AWS Fargate with AWS managed open-source services

AWS constantly innovates on its customers’ behalf and strives to improve customer experience by reducing complexity. With AWS, customers look to spend their time solving business problems without worrying about operating their infrastructure. Amazon Elastic Kubernetes Service (Amazon EKS) on AWS Fargate allows customers to run Kubernetes pods without worrying about creating and managing the lifecycle of underlying worker nodes. With Fargate, you pay only for computing resources, with no upfront expenses. If you can run your containers on Fargate, you can avoid having to size EC2 instances for your workload. Fargate allows you to allocate the required resources for the application and only pay for that. EKS enables a cluster where some pods run on EC2 while others run on Fargate. You can run all the pods in a namespace on Fargate or specify a label for the pods you want to run on Fargate.

Customers are now prioritizing workload gravity and turning to managed services like Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor their workloads. Customers know that establishing a monitoring system helps them visualize essential performance metrics for the cluster and workload. Using metrics such as vCPU utilization, memory utilization, and network usage can offer valuable insights into how resources are being used and help identify potential problems ahead of time.

We recently announced the release of the AWS CDK Observability Accelerator, a set of opinionated modules to help customers set up observability for Amazon EKS clusters. It simplifies observability provisioning and maintenance for Amazon EKS clusters using AWS Cloud Development Kit (AWS CDK) and CDK EKS Blueprints. The AWS CDK Observability Accelerator and AWS Observability Accelerator for Terraform are solutions organized around patterns and reusable units for deploying multiple resources.

In this post, we’ll explore leveraging the AWS CDK Observability Accelerator to build observability quickly for monitoring Amazon EKS on AWS Fargate with AWS-managed open-source services. We will be demonstrating infrastructure monitoring along with monitoring a java workload and Nginx ingress on AWS Fargate.

Solution Overview

The pattern we are going to deploy will provision the following components:

An Amazon EKS cluster powered by Fargate providing on-demand compute capacity for our container pods
AWS Distro For OpenTelemetry (ADOT) Operator and Collector for collecting metrics and traces from applications
AWS for FluentBit to capture and ingest logs into Amazon CloudWatch
Amazon Managed Prometheus configured with rules to collect observability data generated by EKS Fargate Cluster
External Secrets Operator to retrieve and sync the Grafana API keys from AWS Systems Manager
Grafana Operator to add AWS data sources and create Grafana dashboards in Amazon Managed Grafana
Flux will perform GitOps sync to the EKS cluster of a Git repository hosting configuration of Grafana dashboards and AWS data sources. Please check our GitOps with Amazon Managed Grafana module in One Observability Workshop to learn more on this topic.

In the following diagram, you can see the metrics from source to destination through the main components of the pattern:

Figure 1: Architecture Diagram for monitoring infrastructure and workloads on Amazon EKS on AWS Fargate with AWS-managed open-source services.

Prerequisites

You will need the following to complete the steps in this post:

AWS CLI version 2
AWS CDK version 2.86.0 or later
Homebrew to install required packages for macOS or Linux
An existing Amazon Managed Grafana Workspace
Node version 20.0.0 or later
NPM version 10.0.0 or later
Kubectl
Git
Make

Environment setup

Let’s start by setting a few environment variables:

export AWS_REGION=<YOUR AWS REGION>
export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)

Next, let’s get the workspace id of any existing Amazon Managed Grafana workspace:

aws grafana list-workspaces
{
    "workspaces": [
        {
            "authentication": {
                "providers": [
                    "AWS_SSO"
                ]
            },
            "created": "2024-01-03T00:31:59.545000-08:00",
            "description": "Amazon Managed Grafana workspace for aws-observability-accelerator",
            "endpoint": "<<YOUR-WORKSPACE-URL>>",
            "grafanaVersion": "9.4",
            "id": "<<Your-WORKSPACE-ID>>",
            "modified": "2024-01-04T16:15:18.708000-08:00",
            "name": "aws-observability-accelerator",
            "notificationDestinations": [
                "SNS"
            ],
            "status": "ACTIVE",
            "tags": {}
        }
    ]
}

Take note of the workspace ID and endpoint of the Grafana workspace you wish to use and save them in bash variables as shown below.

export COA_AMG_WORKSPACE_ID="<<Your-WORKSPACE-ID>>" export COA_AMG_ENDPOINT_URL="https://<<YOUR-WORKSPACE-URL>>"

Next, create a Grafana API key from Amazon Managed Grafana workspace and create a secret in AWS Systems Manager. The secret will be accessed by the External Secrets add-on and made available as a native Kubernetes secret in the Amazon EKS cluster:

export AMG_API_KEY=$(aws grafana create-workspace-api-key \
  --key-name "grafana-operator-key" \
  --key-role "ADMIN" \
  --seconds-to-live 432000 \
  --workspace-id $COA_AMG_WORKSPACE_ID \
  --query key \
  --output text)

aws ssm put-parameter --name "/cdk-accelerator/grafana-api-key" \
    --type "SecureString" \
    --value $AMG_API_KEY \
    --region $AWS_REGION

Bootstrap CDK

The first step to any AWS Cloud Development Kit (AWS CDK) deployment is bootstrapping the environment. CDK bootstrap is a tool in the AWS CDK command-line interface (AWS CLI) responsible for preparing the environment (i.e., a combination of AWS account and AWS Region) with resources CDK requires to perform deployments into that environment. CDK bootstrapping is needed for each account/region combination, so you don’t need to repeat the process if you have already bootstrapped CDK in a region.

Execute the commands below to bootstrap the AWS environment in your region:

cdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION

Deploying the Fargate open-source observability pattern

Clone the cdk-aws-observability-accelerator repository and install the dependency packages. This repository contains CDK v2 code written in TypeScript.

git clone https://github.com/aws-observability/cdk-aws-observability-accelerator.git
cd cdk-aws-observability-accelerator
make deps
make build
make list

The settings for Grafana dashboard json files are expected to be specified in the CDK context. Generally, such settings are defined inside the cdk.context.json file of the current directory or in ~/cdk.json in your home directory. You will need to update the context in cdk.json file located in cdk-aws-observability-accelerator directory.

Settings for this deployment are:

"context": {
    "fluxRepository": {
      "name": "grafana-dashboards",
      "namespace": "grafana-operator",
      "repository": {
        "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator",
        "name": "grafana-dashboards",
        "targetRevision": "main",
        "path": "./artifacts/grafana-operator-manifests/eks/infrastructure"
      },
      "values": {
        "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json",
        "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json",
        "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json",
        "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json",
        "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"       
      },
      "kustomizations": [
        {
          "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure"
        }
      ]
    }
  }

Now, run the below command to deploy the open-source observability pattern.

make pattern single-new-eks-fargate-opensource-observability deploy

Once the deployment completes, run the update-kubeconfig command.

aws eks update-kubeconfig --name single-new-eks-opensource-observability-accelerator --region <your region> --role-arn arn:aws:iam::xxxxxxxxx:role/single-new-eks-fargate-op-singleneweksfargateopenso-XXXXXXXX

Validation

To validate the status of the resources created by our deployment, we check the status of the Pods by running the below command –

kubectl get pods -A

Figure 2: Resources deployed by the Fargate Open Source CDK Observability Accelerator Pattern

We can confirm that these pods run on separate fargate instances by fetching the nodes. Fargate nodes have the prefix fargate-ip.

kubectl get nodes

Next, confirm whether the Grafana dashboards are deployed as expected.

Note: If you do not see Grafana dashboards, please check whether AMG_API_KEY and the Secrets Manager parameter have been created as described above.

kubectl get grafanadashboards -A

Figure 3: Grafana dashboards deployed via FluxCD

EKS Fargate Infrastructure Monitoring

Login in to your Amazon Managed Grafana workspace and navigate to the dashboards panel. You should see a list of dashboards under the Observability Accelerator Dashboards. Grafana Operator and Flux always work together to synchronize your dashboards with Git. If you delete your Grafana dashboards by accident, they will be re-provisioned automatically.

Figure 4: CDK Observability Accelerator Grafana Dashboards

Feel free to explore the other dashboards developed as part of the AWS observability accelerator. Open the Kubelet dashboard and you should be able to view its visualization as shown below:

Figure 5: Kubelet dashboard from CDK Observability Accelerator

A fully managed EKS Fargate cluster with AWS-managed open-source observability is now available for deploying our workloads. Next, let us deploy additional components that reflect real-world applications you will run in a production environment. We will deploy an NGINX ingress controller and then a Java Tomcat application to monitor using the Grafana dashboards.

NGINX Ingress monitoring

EKS clusters often run multiple services in the same cluster, and incoming requests must be routed to the appropriate service. Ingress-nginx is an ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer that lets us do just that. It is a production-grade ingress controller that can facilitate multi-tenancy and segregation of workload ingresses based on hostname (host-based routing) and/or URL Path (path-based routing). It lets you expose your applications to end-users while providing basic functionalities, such as URL redirection, routing, and load balancing.

To deploy the ingress controller, we will be using the NginxAddOn, which supports Classic Load Balancer (CLB), Network Load Balancer (NLB), or Application Load Balancer (ALB). We must modify our CDK code for the EKS Fargate OSS pattern to include the ingress controller.

Navigate to lib/single-new-eks-fargate-opensource-observability-pattern/index.ts in your cloned git repository.
Add the CDK blueprints Quickstart NGINX addon to the addOns variable.

const addOns: Array<blueprints.ClusterAddOn> = [
            new blueprints.addons.CoreDnsAddOn({
                version: "v1.10.1-eksbuild.6",
                configurationValues: { computeType: "Fargate" }
            }),
            ...
            ... 
            ...
            new blueprints.addons.AmpAddOn(ampAddOnProps),
            // Add the below NginxAddOn to the end of array
             new blueprints.addons.NginxAddOn({
                name: "ingress-nginx",
                chart: "ingress-nginx",
                repository: "https://kubernetes.github.io/ingress-nginx",
                version: "4.7.2",
                namespace: "nginx-ingress-sample",
                values: {
                    controller: { 
                        image:{
                            allowPrivilegeEscalation: false
                        },
                        metrics: {
                            enabled: true,
                            service: {
                                annotations: {
                                    "prometheus.io/port": "10254",
                                    "prometheus.io/scrape": "true"
                                }
                            }
                        }
                    }
                }
            })
        ];

Add another Fargate profile with the appropriate namespaces in the same file so the pods can be scheduled.

 const fargateProfiles: Map<string, eks.FargateProfileOptions> = new Map([
            ["MyProfile", {
                selectors: [
                    { namespace: "cert-manager" },
                    { namespace: "opentelemetry-operator-system" },
                    { namespace: "external-secrets" },
                    { namespace: "grafana-operator" },
                    { namespace: "flux-system" },
                ]
            }],
            // Add a new profile named Nginx to the array
            ["Nginx", {
                selectors: [
                    { namespace: "nginx-ingress-sample" },
                    { namespace: "nginx-sample-traffic" }
                ]
            }], 
        ]);

Finally, we must update OpenTelemetry Collector to scrape NGINX metrics and add a new Grafana dashboard to visualize the data. This can be done by updating the cdk.json. Please pay attention to nginx.pattern.enabled below for scraping and alerting of NGINX metrics via OpenTelemetry Collector.

  "context": {
    "fluxRepository": {
      "values": {
        ...
        ...
        "GRAFANA_NGINX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/nginx/nginx.json"
      },
      "kustomizations": [
        ...
        ...
        {
          "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/nginx"
        }
      ]
    },
    "nginx.pattern.enabled": true
  }

Deploy the pattern again.

make pattern single-new-eks-fargate-opensource-observability deploy

Verify if the application is running. You will also see a Load Balancer deployed in the EC2 console of your AWS account.

kubectl get pods -n nginx-ingress-sample

Figure 6: NGINX ingress from our update

Figure 7: Load balancer provisioning by NGINX ingress

Let’s generate some sample traffic before we visualize it on our new dashboards. The following snippet deploys a manifest consisting of 2 services generating HTTP and curl SSL traffic.

EXTERNAL_IP=$(kubectl get svc blueprints-addon-nginx-ingress-nginx-controller -n nginx-ingress-sample --output jsonpath='{.status.loadBalancer.ingress[0].hostname}')
SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic
curl https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/k8s-deployment-manifest-templates/nginx/nginx-traffic-sample.yaml |
sed "s/{{external_ip}}/$EXTERNAL_IP/g" |
sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" |
kubectl apply -f -

Verify the application is running and wait for it to generate traffic.

kubectl get pods -n nginx-sample-traffic

Figure 8: Sample application generating traffic for NGINX ingress

Login to your Grafana workspace and navigate to the Dashboards panel. You should see a new dashboard named NGINX, under Observability Accelerator Dashboards.

Figure 9: NGINX dashboard from CDK Observability Accelerator

As we can see, the EKS Fargate cluster can provide detailed metrics monitoring for latency, connections, memory usage, network I/O pressure, and errors from NGINX ingress.

Java Workload Monitoring

For customers running Java-based workloads on EKS clusters on Fargate, this section describes the steps to configure and deploy a monitoring solution using this solution. ADOT can be configured to collect Prometheus metrics of Java Virtual Machine (JVM), Java, and Tomcat (Catalina) on an EKS Fargate cluster. We will be using Docker to build our image and Amazon ECR as a repository for our Tomcat sample application docker image.

First, we must include the appropriate fargate profile in our CDK construct. In this example, we’re creating a namespace java/jmx-sample and attaching it to the fargate profile named Java.

const fargateProfiles: Map<string, eks.FargateProfileOptions> = new Map([
           ...
           ...
           // Add new profile to Array
            ["Java", {
                selectors:[
                    { namespace: "javajmx-sample" }
                ]
            }]
        ]);

We must update OpenTelemetry Collector to scrape Java Management Extensions (JMX) metrics and add a new Grafana dashboard to visualize the data. This can be done by updating the cdk.json.

"context": {
    "fluxRepository": {
      ...
      ...
      "values": {
        ...
        ...
        "GRAFANA_JAVA_JMX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/java/default.json"
      },
      "kustomizations": [
        ...
        ... 
        {
          "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/java"
        }
      ]
    },
    "nginx.pattern.enabled": true,
    "java.pattern.enabled": true
  }

Deploy the pattern again.

make pattern single-new-eks-fargate-opensource-observability deploy

We can now go ahead and deploy a Java workload to monitor. Clone this repository and navigate to the sample-apps/jmx/ directory.
Authenticate to Amazon ECR and create an ECR repository.

export AWS_ACCOUNT_ID=`aws sts get-caller-identity --query Account --output text`
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

aws ecr create-repository --repository-name prometheus-sample-tomcat-jmx \
  --image-scanning-configuration scanOnPush=true \
  --region $AWS_REGION

Build Docker image and push to ECR.

docker build -t $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/prometheus-sample-tomcat-jmx:latest .
docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/prometheus-sample-tomcat-jmx:latest

Deploy the Docker image to the EKS Fargate cluster.

SAMPLE_TRAFFIC_NAMESPACE=javajmx-sample
curl https://raw.githubusercontent.com/aws-observability/aws-otel-test-framework/terraform/sample-apps/jmx/examples/prometheus-metrics-sample.yaml | 
sed "s/{{aws_account_id}}/$AWS_ACCOUNT_ID/g" |
sed "s/{{region}}/$AWS_REGION/g" |
sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" |

Validate the deployment.

Note: If you use an M1 Mac to build and publish a Tomcat container, the deployment may fail with CrashLoopBackOff. You can build your docker image using AWS CloudShell and repeating steps 5 to 7.

kubectl get pods -n javajmx-sample

Figure 10: Sample application generating traffic for Java workload dashboard

Login to your Grafana workspace and navigate to the Dashboards panel. You should see a new dashboard, Java/JMX, under Observability Accelerator Dashboards.

Fig 11: Java/JMX dashboard from CDK Observability Accelerator

Using the above visualizations, one can monitor the following key metrics exposed by the workload :

Number of threads being used by the application
Heap vs nonheap memory distribution
CPU usage and others.

These metrics can help platform and app dev teams to troubleshoot issues and gain visibility into the performance and behavior of their workloads.

Clean up

Some of the components deployed in this blog post would incur costs. To clean up the components that were deployed, you can teardown the whole CDK stack with the following command:

make pattern single-new-eks-fargate-opensource-observability destroy

If you followed along with the Java monitoring demo then be sure to delete the ECR repository with the following command

aws ecr delete-repository \
    --repository-name prometheus-sample-tomcat-jmx \
    --region $AWS_REGION \
    --force

Please navigate to AWS CloudFormation console to make sure your stacks are deleted clean. If you see any failures, please delete the stacks manually.

Conclusion

In this post, we showed how to leverage the AWS CDK Observability Accelerator to quickly build observability for monitoring Amazon EKS on AWS Fargate with AWS-managed open-source services. We started with the AWS Fargate infrastructure monitoring demonstration and visualizing Fargate infrastructure metrics using out-of-the-box Grafana dashboards deployed using gitops on Amazon Managed Grafana. Further, we updated the solution to deploy Nginx Ingress monitoring on AWS Fargate, along with visualizing metrics such latency, connections, memory usage, network I/O pressure, and errors from Nginx ingress on Amazon Managed Grafana. Finally, we updated our solution to deploy java workload monitoring with a sample Java workload on AWS Fargate and visualized metrics such as the Number of threads being used by the application, Heap vs. nonheap memory distribution, and CPU usage on Amazon Managed Grafana. We recommend that you try out all our patterns as we get these out and continue to support and contribute to our AWS CDK observability accelerator open-source project.

For more information, see the following references:

AWS Cloud Operations Blog

Observing Kubernetes workloads on AWS Fargate with AWS managed open-source services

Solution Overview

Prerequisites

Environment setup

Bootstrap CDK

Deploying the Fargate open-source observability pattern

Validation

EKS Fargate Infrastructure Monitoring

NGINX Ingress monitoring

Java Workload Monitoring

Clean up

Conclusion

About the Authors

Resources

Follow