Containers
Using Prometheus Metrics in Amazon CloudWatch
Imaya Kumar Jagannathan, Justin Gu, Marc Chéné, and Michael Hausenblas
Update 2020-09-08: The feature described in this post is now in GA, see details in the Amazon CloudWatch now monitors Prometheus metrics from Container environments What’s New item.
Earlier this week we announced the public beta support for monitoring Prometheus metrics in CloudWatch Container Insights. With this post we want to show you how you can use this new Amazon CloudWatch feature for containerized workloads in Amazon Elastic Kubernetes Service (EKS) and Kubernetes on AWS cluster provisioned by yourself.
Prometheus is a popular open source monitoring tool that graduated as a Cloud Native Compute Foundation (CNCF) project, with a large and active community of practitioners. Amazon CloudWatch Container Insights automates the discovery and collection of Prometheus metrics from containerized applications. It automatically collects, filters, and creates aggregated custom CloudWatch metrics visualized in dashboards for workloads such as AWS App Mesh, NGINX, Java/JMX, Memcached, and HAProxy. By default, preselected services are scraped and pre-aggregated every 60 seconds and automatically enriched with metadata such as cluster and pod names.
We’re aiming at supporting any Prometheus exporters compatible with OpenMetrics, allowing you to scrape any containerized workload using one of the 150+ open source third party exporters.
How does it work? You need to run the CloudWatch agent in your Kubernetes cluster. The agent now supports Prometheus configuration, discovery, and metric pull features, enriching and publishing all high fidelity Prometheus metrics and metadata as Embedded Metric Format (EMF) to CloudWatch Logs. Each event creates metric data points as CloudWatch custom metrics for a curated set of metric dimensions that is fully configurable. Publishing aggregated Prometheus metrics as CloudWatch custom metrics statistics reduces the number of metrics needed to monitor, alarm, and troubleshoot performance problems and failures. You can also analyze the high-fidelity Prometheus metrics using CloudWatch Logs Insights query language to isolate specific pods and labels impacting the health and performance of your containerized environments.
With that said, let us now move on to the practical part where we will show you how to use the CloudWatch Container Insights Prometheus metrics in two setups: we start with a simple example of scraping NGINX and then have a look at how to use custom metrics by instrumenting a ASP.NET Core app.
Out-of-the-box metrics from NGINX
In this first example we’re using an EKS cluster as the runtime environment and deploy the CW Prometheus agent for ingesting them as EMF events into CloudWatch. We use NGINX as an Ingress Controller as the scrape target and a dedicated app generating traffic for it. Overall the setup looks as follows:
We have three namespaces in the EKS cluster: amazon-cloudwatch
which hosts the CW Prometheus agent, nginx-ingress-sample
where we have the NGINX Ingress controller running, and nginx-sample-traffic
which hosts our sample app, incl. the traffic generator.
If you want to follow along, you will need eksctl installed to provision the EKS cluster as well as Helm 3 for the application installation.
For the EKS cluster we’re using the following cluster configuration (save as clusterconfig.yaml
and note that you potentially want to change the region to something geographically closer):
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: cw-prom
region: eu-west-1
iam:
withOIDC: true
managedNodeGroups:
- name: defaultng
minSize: 1
maxSize: 4
desiredCapacity: 2
labels: {role: mngworker}
iam:
withAddonPolicies:
externalDNS: true
certManager: true
ebs: true
albIngress: true
cloudWatch: true
appMesh: true
cloudWatch:
clusterLogging:
enableTypes: ["*"]
You can then provision the EKS cluster with the following command:
eksctl create cluster -f clusterconfig.yaml
Under the hood, eksctl
uses CloudFormation, so you can have a look in the console there on the progress. Expect the provisioning to take something like 15 min end to end.
Next, we install the NGINX Ingress controller in the dedicated Kubernetes namespace nginx-ingress-sample
, using Helm:
kubectl create namespace nginx-ingress-sample
help repo add stable https://charts.helm.sh/stable
helm install stable/nginx-ingress --generate-name --version 1.33.5 \
--namespace nginx-ingress-sample \
--set controller.metrics.enabled=true \
--set controller.metrics.service.annotations."prometheus\.io/port"="10254" \
--set controller.metrics.service.annotations."prometheus\.io/scrape"="true"
In order to target the traffic generator to the load balancer managed by the NGINX Ingress controller, we have to query its public IP address like so:
$ kubectl -n nginx-ingress-sample get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-ingress-1588245517-controller LoadBalancer 10.100.245.88 ac8cebb58959a4627a573fa5e5bd0937-2083146415.eu-west-1.elb.amazonaws.com 80:31881/TCP,443:32010/TCP 72s
nginx-ingress-1588245517-controller-metrics ClusterIP 10.100.32.79 <none> 9913/TCP 72s
nginx-ingress-1588245517-default-backend ClusterIP 10.100.75.112 <none> 80/TCP 72s
With that we now have everything to set up the sample app and the traffic generator in the nginx-sample-traffic
namespace (note that for EXTERNAL_IP
you will have to supply your own IP you figured out in the previous step):
SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic
EXTERNAL_IP=ac8cebb58959a4627a573fa5e5bd0937-2083146415.eu-west-1.elb.amazonaws.com
curl https://cloudwatch-agent-k8s-yamls.s3.amazonaws.com/quick-start/nginx-traffic-sample.yaml | \
sed "s/{{external_ip}}/$EXTERNAL_IP/g" | \
sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" | \
kubectl apply -f -
Last but not least we install the CW agent in the amazon-cloudwatch
namespace, using:
kubectl create namespace amazon-cloudwatch
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/prometheus-beta/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml
We’re almost there but we need one more thing: we need to give the CW agent the permissions to write metrics to CloudWatch. For this we’re using IAM Roles for Service Accounts (IRSA), a EKS feature that allows for least-privileges access control, effectively restricting the access to CW via the CloudWatchAgentServerPolicy
directly to the pod running the CW agent:
eksctl create iamserviceaccount \
--name cwagent-prometheus \
--namespace amazon-cloudwatch \
--cluster cw-prom \
--attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
--override-existing-serviceaccounts \
--approve
Now we are in a position to verify the setup. First, we check if the service account that the CW agent deployment uses has been annotated properly (you should see an annotation with the key eks.amazonaws.com/role-arn
here):
$ kubectl -n amazon-cloudwatch get sa cwagent-prometheus -o yaml |\
grep eks.amazon
eks.amazonaws.com/role-arn (http://eks.amazonaws.com/role-arn): arn:aws:iam::148658015984:role/eksctl-cw-prom-addon-iamserviceaccount-amazo-Role1-69WKQE6D9CG3
You should also verify that the CWAgent is running properly with kubectl -n amazon-cloudwatch get pod
,
which should show it in Running
state.
Given we have everything deployed and running, we can now query the metrics from the CLI as follows:
aws logs start-query \
--log-group-name /aws/containerinsights/cw-prom/prometheus \
--start-time `date -v-1H +%s` \
--end-time `date +%s` \
--query-string "fields @timestamp, Service, CloudWatchMetrics.0.Metrics.0.Name as PrometheusMetricName, @message | sort @timestamp desc | limit 50 | filter CloudWatchMetrics.0.Namespace='ContainerInsights/Prometheus'"
aws logs get-query-results \
--query-id e69f2544-add0-4d14-98ff-0fadb54f27f1
The output of above aws logs
command is something along the line of (note the Prometheus metrics encoded in the last value
field shown here:
{
"results": [
[
{
"field": "@timestamp",
"value": "2020-04-30 11:40:38.230"
},
{
"field": "Service",
"value": "nginx-ingress-1588245517-controller-metrics"
},
{
"field": "PrometheusMetricName",
"value": "nginx_ingress_controller_nginx_process_connections"
},
{
"field": "@message",
"value": "{\"CloudWatchMetrics\":[{\"Metrics\":[{\"Name\":\"nginx_ingress_controller_nginx_process_connections\"}],\"Dimensions\":[[\"ClusterName\",\"Namespace\",\"Service\"]],\"Namespace\":\"ContainerInsights/Prometheus\"}],\"ClusterName\":\"cw-prom\",\"Namespace\":\"nginx-ingress-sample\",\"Service\":\"nginx-ingress-1588245517-controller-metrics\",\"Timestamp\":\"1588246838202\",\"Version\":\"0\",\"app\":\"nginx-ingress\",\"chart\":\"nginx-ingress-1.33.5\",\"component\":\"controller\",\"container_name\":\"nginx-ingress-controller\",\"controller_class\":\"nginx\",\"controller_namespace\":\"nginx-ingress-sample\",\"controller_pod\":\"nginx-ingress-1588245517-controller-56d5d786cd-xqwc2\",\"heritage\":\"Helm\",\"instance\":\"192.168.89.24:10254\",\"job\":\"kubernetes-service-endpoints\",\"kubernetes_node\":\"ip-192-168-73-163.eu-west-1.compute.internal\",\"nginx_ingress_controller_nginx_process_connections\":1,\"pod_name\":\"nginx-ingress-1588245517-controller-56d5d786cd-xqwc2\",\"prom_metric_type\":\"gauge\",\"release\":\"nginx-ingress-1588245517\",\"state\":\"active\"}"
},
...
Having seen how to scrape and use Prometheus metrics out-of-the box, in our example from NGINX, let’s now move on to the topic of how to use custom metrics.
Custom metrics from ASP.NET Core app
In this following setup we will instrument an ASP.NET Core application using Prometheus client libraries with the goal to expose custom metrics and ingest these metrics into CloudWatch. We will do this using the CloudWatch Prometheus agent with a custom configuration.
Instrumenting app to expose custom metrics
First, clone the sample application from aws-samples/amazon-cloudwatch-prometheus-metrics-sample and have a look at the HomeController.cs
file:
// Prometheus metrics:
private static readonly Counter HomePageHitCounter = Metrics.CreateCounter("PrometheusDemo_HomePage_Hit_Count", "Count the number of hits to Home Page");
private static readonly Gauge SiteVisitorsCounter = Metrics.CreateGauge("PrometheusDemo_SiteVisitors_Gauge", "Site Visitors Gauge");
public IActionResult Index() {
HomePageHitCounter.Inc();
SiteVisitorsCounter.Set(rn.Next(1, 15));
return View();
}
As well as the ProductsController.cs
file:
// Prometheus metric:
private static readonly Counter ProductsPageHitCounter = Metrics.CreateCounter("PrometheusDemo_ProductsPage_Hit_Count", "Count the number of hits to Products Page");
public IActionResult Index(){
ProductsPageHitCounter.Inc();
return View();
}
The code snippets shown above instrument three different metrics to track the number of visitors to each page and overall visitors in general using an open source Prometheus client library.
Next, for local testing and preview, navigate to the directory where the Dockerfile
is located. Build the container image and run it using the following commands:
docker build . -t prometheusdemo
docker run -p 80:80 prometheusdemo
Now navigate to localhost
where you should be able to see a screen like the one below. Click on the Home and Products links a few times to generate some traffic:
Next, navigate to the http://localhost/metrics
where you should see all the Prometheus metrics the app is exposing via the /metrics
endpoint:
Setting up CloudWatch agent for discovering Prometheus metrics
Open the prometheus-eks.yaml
file under /kubernetes
folder in the repo. The following configuration under cwagentconfig.json
section in the YAML file shows the metrics that the CloudWatch agent will scrape from the application:
{
"source_labels": ["job"],
"label_matcher": "prometheusdemo-dotnet",
"dimensions": [["ClusterName","Namespace"]],
"metric_selectors": [
"^process_cpu_seconds_total$",
"^process_open_handles$",
"^process_virtual_memory_bytes$",
"^process_start_time_seconds$",
"^process_private_memory_bytes$",
"^process_working_set_bytes$",
"^process_num_threads$"
]
},
{
"source_labels": ["job"],
"label_matcher": "^prometheusdemo-dotnet$",
"dimensions": [["ClusterName","Namespace"]],
"metric_selectors": [
"^dotnet_total_memory_bytes$",
"^dotnet_collection_count_total$",
"^dotnet_gc_finalization_queue_length$",
"^dotnet_jit_method_seconds_total$",
"^dotnet_jit_method_total$",
"^dotnet_threadpool_adjustments_total$",
"^dotnet_threadpool_io_num_threads$",
"^dotnet_threadpool_num_threads$",
"^dotnet_gc_pinned_objects$"
]
},
{
"source_labels": ["job"],
"label_matcher": "^prometheusdemo-dotnet$",
"dimensions": [["ClusterName","Namespace","gc_heap"]],
"metric_selectors": [
"^dotnet_gc_allocated_bytes_total$"
]
},
{
"source_labels": ["job"],
"label_matcher": "prometheusdemo-dotnet",
"dimensions": [["ClusterName","Namespace","app"]],
"metric_selectors": [
"^PrometheusDemo_HomePage_Hit_Count$",
"^PrometheusDemo_SiteVisitors_Gauge$",
"^PrometheusDemo_ProductsPage_Hit_Count$"
]
}
In prometheus.yaml
you will find the following section, instructing the CloudWatch agent about the Prometheus metric endpoint details, using the standard Prometheus configuration. Note that we have to make the regex for the address
source label to match the endpoint and port number from which our sample application is exposing the metrics:
- job_name: 'prometheusdemo-dotnet'
sample_limit: 10000
metrics_path: /metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__address__]
action: keep
regex: '.*:80$'
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: Namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container_name
- action: replace
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: pod_controller_name
- action: replace
source_labels:
- __meta_kubernetes_pod_controller_kind
target_label: pod_controller_kind
- action: replace
source_labels:
- __meta_kubernetes_pod_phase
target_label: pod_phase
Deploying app and CloudWatch Prometheus agent
First, push the Docker container image of the ASP.NET Core app you built earlier to a container registry of your choice. Then, replace <YOUR_CONTAINER_REPO>
in the deployment.yaml
file with the value of the container repo you published the image to.
Now that we have everything configured, we deploy the sample application and the CloudWatch Prometheus agent into the Kubernetes cluster, using the following command:
kubectl apply -f kubernetes/
Note that now you can optionally enable CloudWatch Container Insights in the Kubernetes cluster, allowing you to see the infrastructure map and automatic dashboard in the AWS Management Console.
You can verify your setup as follows (you should see all the pods here in the running state):
$ kubectl -n amazon-cloudwatch get pods
NAME READY STATUS RESTARTS AGE
cloudwatch-agent-785zq 1/1 Running 0 26h
cloudwatch-agent-tjxcj 1/1 Running 0 26h
cwagent-prometheus-75dfcd47d7-gtx58 1/1 Running 0 120m
fluentd-cloudwatch-7ttck 1/1 Running 0 26h
fluentd-cloudwatch-n2jvm 1/1 Running 0 26h
Creating custom dashboards
In order to create a custom dashboard called PrometheusDotNetApp
in the us-east-2
AWS region, execute:
dashboardjson=$(<cloudwatch_dashboard.json)
aws cloudwatch put-dashboard \
--dashboard-name PrometheusDotNetApp \
--dashboard-body $dashboardjson
If you want the dashboard to be created in another region, replace us-east-2
in the JSON config file with your desired value.
Now you can navigate to the CloudWatch dashboard you just created and you should be able to see something similar to the following:
CloudWatch Container Insights publishes automatic dashboards created based on performance metrics from the Kubernetes cluster. Navigate to the Performance Monitoring page on CloudWatch Container Insights to see the automatic dashboard created by Container Insights:
With CloudWatch Container Insights enabled, you have access to a map view under Resources, showing you the Kubernetes cluster topology including its components:
With this we wrap up the custom metrics example scenario and have a quick look at what’s up next.
Next steps
We’re excited to be able to open up the CloudWatch Container Insights support for Prometheus metric into a public beta. We would like to hear from you how you’re using this new feature and what you expect to see going forward. For example, PromQL support or native support for Prometheus histograms or summary metrics. Please share your experiences and keep an eye on this space, we keep improving and adding new features based on your feedback.