Observing Kubernetes workloads on AWS Fargate with AWS managed open-source services
AWS constantly innovates on its customers’ behalf and strives to improve customer experience by reducing complexity. With AWS, customers look to spend their time solving business problems without worrying about operating their infrastructure. Amazon Elastic Kubernetes Service (Amazon EKS) on AWS Fargate allows customers to run Kubernetes pods without worrying about creating and managing the lifecycle of underlying worker nodes. With Fargate, you pay only for computing resources, with no upfront expenses. If you can run your containers on Fargate, you can avoid having to size EC2 instances for your workload. Fargate allows you to allocate the required resources for the application and only pay for that. EKS enables a cluster where some pods run on EC2 while others run on Fargate. You can run all the pods in a namespace on Fargate or specify a label for the pods you want to run on Fargate.
Customers are now prioritizing workload gravity and turning to managed services like Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor their workloads. Customers know that establishing a monitoring system helps them visualize essential performance metrics for the cluster and workload. Using metrics such as vCPU utilization, memory utilization, and network usage can offer valuable insights into how resources are being used and help identify potential problems ahead of time.
In this post, we’ll explore leveraging the AWS CDK Observability Accelerator to build observability quickly for monitoring Amazon EKS on AWS Fargate with AWS-managed open-source services. We will be demonstrating infrastructure monitoring along with monitoring a java workload and Nginx ingress on AWS Fargate.
Solution Overview
The pattern we are going to deploy will provision the following components:
An Amazon EKS cluster powered by Fargate providing on-demand compute capacity for our container pods
AWS for FluentBit to capture and ingest logs into Amazon CloudWatch
Amazon Managed Prometheus configured with rules to collect observability data generated by EKS Fargate Cluster
External Secrets Operator to retrieve and sync the Grafana API keys from AWS Systems Manager
Grafana Operator to add AWS data sources and create Grafana dashboards in Amazon Managed Grafana
Flux will perform GitOps sync to the EKS cluster of a Git repository hosting configuration of Grafana dashboards and AWS data sources. Please check our GitOps with Amazon Managed Grafana module in One Observability Workshop to learn more on this topic.
In the following diagram, you can see the metrics from source to destination through the main components of the pattern:
Figure 1: Architecture Diagram for monitoring infrastructure and workloads on Amazon EKS on AWS Fargate with AWS-managed open-source services.
Prerequisites
You will need the following to complete the steps in this post:
Next, create a Grafana API key from Amazon Managed Grafana workspace and create a secret in AWS Systems Manager. The secret will be accessed by the External Secrets add-on and made available as a native Kubernetes secret in the Amazon EKS cluster:
The first step to any AWS Cloud Development Kit (AWS CDK) deployment is bootstrapping the environment. CDK bootstrap is a tool in the AWS CDK command-line interface (AWS CLI) responsible for preparing the environment (i.e., a combination of AWS account and AWS Region) with resources CDK requires to perform deployments into that environment. CDK bootstrapping is needed for each account/region combination, so you don’t need to repeat the process if you have already bootstrapped CDK in a region.
Execute the commands below to bootstrap the AWS environment in your region:
cdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION
Deploying the Fargate open-source observability pattern
Clone the cdk-aws-observability-accelerator repository and install the dependency packages. This repository contains CDK v2 code written in TypeScript.
git clone https://github.com/aws-observability/cdk-aws-observability-accelerator.git
cd cdk-aws-observability-accelerator
make deps
make build
make list
The settings for Grafana dashboard json files are expected to be specified in the CDK context. Generally, such settings are defined inside the cdk.context.json file of the current directory or in ~/cdk.json in your home directory. You will need to update the context in cdk.json file located in cdk-aws-observability-accelerator directory.
To validate the status of the resources created by our deployment, we check the status of the Pods by running the below command –
kubectl get pods -A
Figure 2: Resources deployed by the Fargate Open Source CDK Observability Accelerator Pattern
We can confirm that these pods run on separate fargate instances by fetching the nodes. Fargate nodes have the prefix fargate-ip.
kubectl get nodes
Next, confirm whether the Grafana dashboards are deployed as expected.
Note: If you do not see Grafana dashboards, please check whether AMG_API_KEY and the Secrets Manager parameter have been created as described above.
kubectl get grafanadashboards -A
Figure 3: Grafana dashboards deployed via FluxCD
EKS Fargate Infrastructure Monitoring
Login in to your Amazon Managed Grafana workspace and navigate to the dashboards panel. You should see a list of dashboards under the Observability Accelerator Dashboards. Grafana Operator and Flux always work together to synchronize your dashboards with Git. If you delete your Grafana dashboards by accident, they will be re-provisioned automatically.
Feel free to explore the other dashboards developed as part of the AWS observability accelerator. Open the Kubelet dashboard and you should be able to view its visualization as shown below:
Figure 5: Kubelet dashboard from CDK Observability Accelerator
A fully managed EKS Fargate cluster with AWS-managed open-source observability is now available for deploying our workloads. Next, let us deploy additional components that reflect real-world applications you will run in a production environment. We will deploy an NGINX ingress controller and then a Java Tomcat application to monitor using the Grafana dashboards.
NGINX Ingress monitoring
EKS clusters often run multiple services in the same cluster, and incoming requests must be routed to the appropriate service. Ingress-nginx is an ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer that lets us do just that. It is a production-grade ingress controller that can facilitate multi-tenancy and segregation of workload ingresses based on hostname (host-based routing) and/or URL Path (path-based routing). It lets you expose your applications to end-users while providing basic functionalities, such as URL redirection, routing, and load balancing.
To deploy the ingress controller, we will be using the NginxAddOn, which supports Classic Load Balancer (CLB), Network Load Balancer (NLB), or Application Load Balancer (ALB). We must modify our CDK code for the EKS Fargate OSS pattern to include the ingress controller.
Navigate to lib/single-new-eks-fargate-opensource-observability-pattern/index.ts in your cloned git repository.
const addOns: Array<blueprints.ClusterAddOn> = [
new blueprints.addons.CoreDnsAddOn({
version: "v1.10.1-eksbuild.6",
configurationValues: { computeType: "Fargate" }
}),
...
...
...
new blueprints.addons.AmpAddOn(ampAddOnProps),
// Add the below NginxAddOn to the end of array
new blueprints.addons.NginxAddOn({
name: "ingress-nginx",
chart: "ingress-nginx",
repository: "https://kubernetes.github.io/ingress-nginx",
version: "4.7.2",
namespace: "nginx-ingress-sample",
values: {
controller: {
image:{
allowPrivilegeEscalation: false
},
metrics: {
enabled: true,
service: {
annotations: {
"prometheus.io/port": "10254",
"prometheus.io/scrape": "true"
}
}
}
}
}
})
];
Add another Fargate profile with the appropriate namespaces in the same file so the pods can be scheduled.
const fargateProfiles: Map<string, eks.FargateProfileOptions> = new Map([
["MyProfile", {
selectors: [
{ namespace: "cert-manager" },
{ namespace: "opentelemetry-operator-system" },
{ namespace: "external-secrets" },
{ namespace: "grafana-operator" },
{ namespace: "flux-system" },
]
}],
// Add a new profile named Nginx to the array
["Nginx", {
selectors: [
{ namespace: "nginx-ingress-sample" },
{ namespace: "nginx-sample-traffic" }
]
}],
]);
Finally, we must update OpenTelemetry Collector to scrape NGINX metrics and add a new Grafana dashboard to visualize the data. This can be done by updating the cdk.json. Please pay attention to nginx.pattern.enabled below for scraping and alerting of NGINX metrics via OpenTelemetry Collector.
make pattern single-new-eks-fargate-opensource-observability deploy
Verify if the application is running. You will also see a Load Balancer deployed in the EC2 console of your AWS account.
kubectl get pods -n nginx-ingress-sample
Figure 6: NGINX ingress from our update
Figure 7: Load balancer provisioning by NGINX ingress
Let’s generate some sample traffic before we visualize it on our new dashboards. The following snippet deploys a manifest consisting of 2 services generating HTTP and curl SSL traffic.
EXTERNAL_IP=$(kubectl get svc blueprints-addon-nginx-ingress-nginx-controller -n nginx-ingress-sample --output jsonpath='{.status.loadBalancer.ingress[0].hostname}')
SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic
curl https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/k8s-deployment-manifest-templates/nginx/nginx-traffic-sample.yaml |
sed "s/{{external_ip}}/$EXTERNAL_IP/g" |
sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" |
kubectl apply -f -
Verify the application is running and wait for it to generate traffic.
kubectl get pods -n nginx-sample-traffic
Figure 8: Sample application generating traffic for NGINX ingress
Login to your Grafana workspace and navigate to the Dashboards panel. You should see a new dashboard named NGINX, under Observability Accelerator Dashboards.
Figure 9: NGINX dashboard from CDK Observability Accelerator
As we can see, the EKS Fargate cluster can provide detailed metrics monitoring for latency, connections, memory usage, network I/O pressure, and errors from NGINX ingress.
Java Workload Monitoring
For customers running Java-based workloads on EKS clusters on Fargate, this section describes the steps to configure and deploy a monitoring solution using this solution. ADOT can be configured to collect Prometheus metrics of Java Virtual Machine (JVM), Java, and Tomcat (Catalina) on an EKS Fargate cluster. We will be using Docker to build our image and Amazon ECR as a repository for our Tomcat sample application docker image.
First, we must include the appropriate fargate profile in our CDK construct. In this example, we’re creating a namespace java/jmx-sample and attaching it to the fargate profile named Java.
const fargateProfiles: Map<string, eks.FargateProfileOptions> = new Map([
...
...
// Add new profile to Array
["Java", {
selectors:[
{ namespace: "javajmx-sample" }
]
}]
]);
We must update OpenTelemetry Collector to scrape Java Management Extensions (JMX) metrics and add a new Grafana dashboard to visualize the data. This can be done by updating the cdk.json.
Deploy the Docker image to the EKS Fargate cluster.
SAMPLE_TRAFFIC_NAMESPACE=javajmx-sample
curl https://raw.githubusercontent.com/aws-observability/aws-otel-test-framework/terraform/sample-apps/jmx/examples/prometheus-metrics-sample.yaml |
sed "s/{{aws_account_id}}/$AWS_ACCOUNT_ID/g" |
sed "s/{{region}}/$AWS_REGION/g" |
sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" |
Validate the deployment.
Note: If you use an M1 Mac to build and publish a Tomcat container, the deployment may fail with CrashLoopBackOff. You can build your docker image using AWS CloudShell and repeating steps 5 to 7.
kubectl get pods -n javajmx-sample
Figure 10: Sample application generating traffic for Java workload dashboard
Login to your Grafana workspace and navigate to the Dashboards panel. You should see a new dashboard, Java/JMX, under Observability Accelerator Dashboards.
Fig 11: Java/JMX dashboard from CDK Observability Accelerator
Using the above visualizations, one can monitor the following key metrics exposed by the workload :
Number of threads being used by the application
Heap vs nonheap memory distribution
CPU usage and others.
These metrics can help platform and app dev teams to troubleshoot issues and gain visibility into the performance and behavior of their workloads.
Clean up
Some of the components deployed in this blog post would incur costs. To clean up the components that were deployed, you can teardown the whole CDK stack with the following command:
make pattern single-new-eks-fargate-opensource-observability destroy
If you followed along with the Java monitoring demo then be sure to delete the ECR repository with the following command
Please navigate to AWS CloudFormation console to make sure your stacks are deleted clean. If you see any failures, please delete the stacks manually.
Conclusion
In this post, we showed how to leverage the AWS CDK Observability Accelerator to quickly build observability for monitoring Amazon EKS on AWS Fargate with AWS-managed open-source services. We started with the AWS Fargate infrastructure monitoring demonstration and visualizing Fargate infrastructure metrics using out-of-the-box Grafana dashboards deployed using gitops on Amazon Managed Grafana. Further, we updated the solution to deploy Nginx Ingress monitoring on AWS Fargate, along with visualizing metrics such latency, connections, memory usage, network I/O pressure, and errors from Nginx ingress on Amazon Managed Grafana. Finally, we updated our solution to deploy java workload monitoring with a sample Java workload on AWS Fargate and visualized metrics such as the Number of threads being used by the application, Heap vs. nonheap memory distribution, and CPU usage on Amazon Managed Grafana. We recommend that you try out all our patterns as we get these out and continue to support and contribute to our AWS CDK observability accelerator open-source project.
For more information, see the following references: