AWS Cloud Operations Blog
Using Open Source Grafana Operator on your Kubernetes cluster to manage Amazon Managed Grafana
Introduction
Kubernetes APIs are robust and its control loop mechanism allows us to control the state of resources that are even outside of Kubernetes environments. Customers have shifted their focus towards workload gravity and rely on Kubernetes-native controllers to deploy and manage the lifecycle of external resources such as Cloud resources. We have seen customers installing AWS Controllers for Kubernetes (ACK) to create, deploy and manage AWS services. Many customers these days opt to offload the Prometheus and Grafana implementations to managed services and in case of AWS these services are Amazon Managed Service for Prometheus and Amazon Managed Grafana for monitoring their workloads. Fundamentally what they need is one single API – the Kubernetes API, to control heterogeneous deployments.
The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside Kubernetes. Grafana Operator enables you to create and manage Grafana resources such as dashboards and data sources, declaratively between multiple instances in an easy and scalable way. Using Grafana Operator to manage Grafana instances using code in a Kubernetes native way, there was no mechanism to integrate Grafana services deployed outside of the cluster, such as Amazon Managed Grafana until recently.
The AWS team collaborated with the grafana-operator team and submitted a design proposal to support the integration of external Grafana instances. With this mechanism it will be possible to add external Prometheus compatible data sources (such as, Amazon Managed Service for Prometheus) and create Grafana dashboards in external Grafana instances (e.g., Amazon Managed Grafana) from your Kubernetes cluster. This enables us to use our Kubernetes cluster to create and manage the lifecyle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using Cloud Native Computing Foundation (CNCF) projects such as Flux to create and manage the lifecyle of resources in Amazon Managed Grafana.
In this post, we will be demonstrating how to use Grafana Operator from your Kubernetes cluster to add Amazon Managed Service for Prometheus as a data source and create dashboards in Amazon Managed Grafana in a Kubernetes native way.
Solution Architecture
The architecture diagram shows the demonstration of Kubernetes cluster as a control plane with using Grafana Operator to setup an identity with AMG, adding Amazon Managed Service for Prometheus as a data source and creating dashboards on Amazon Managed Grafana from Amazon EKS cluster in a Kubernetes native way.
Solution Walkthrough
Prerequisites
You will need the following to complete the steps in this post:
- AWS CLI version 2
- AWS CDK version 2.66.0 or later
- Node version 18.12.1 or later
- NPM version 8.19.2 or later
- Kubectl
- Git
- jq
- Helm
- An existing Amazon Managed Grafana Workspace
Let’s start by setting a few environment variables:
Clone the sample repository which contains the code for our solution :
Bootstrap the Environment
In this post you will be using Amazon EKS CDK Blueprints to provision our Amazon EKS cluster. The first step to any CDK deployment is bootstrapping the environment. cdk bootstrap
is a tool in the AWS CDK command-line interface (AWS CLI) responsible for preparing the environment (i.e., a combination of AWS account and AWS Region) with resources required by CDK to perform deployments into that environment. If you already use CDK in a region, you don’t need to repeat the bootstrapping process.
Let’s run the below commands to bootstrap your environment and install all node dependencies required for deploying the solution:
Next, lets try to grab the workspace id of any existing Amazon Managed Grafana workspace:
Next, let’s create a grafana api key from your Amazon Managed Grafana workspace and setup a secret on AWS Secrets Manager which will be used to access from external secrets by our Amazon EKS cluster:
Please navigate to bin/grafana-operator-amg.ts
in the cloned repo to check on the Amazon EKS CDK Blueprints stack which will deploy EKS Cluster with day 2 operational add-ons required to run our solution. Please see the below bin/grafana-operator-amg.ts
snippet showing our EKS CDK Blueprints stack:
Next, run the cdk list command which lists name of stack that will be created.
If you are interested in knowing list of resources that will be created by this stack, you can view them using cdk diff
command.
Create the clusters and deploy the addons
Run the following command to deploy the Amazon EKS cluster with day 2 operational add-ons required to run our solution.
Deployment will take approximately 20-30 minutes to complete. Upon completion, you will have a fully functioning EKS cluster deployed in your account.
This blueprint will deploy the following:
- Amazon Virtual Private Cloud (Amazon VPC) with both Public and Private subnets
- An Amazon EKS cluster in the region and account you specify
- Amazon VPC CNI Add-on to your EKS cluster to support native VPC networking
- External Secrets Addon to integrate with AWS Secrets Manager to pull Amazon Managed Grafana api key
- CoreDNS Addon is a flexible, extensible DNS server that can serve as the Kubernetes cluster DNS
- CertManager Addon to install and manage the AWS Distro for OpenTelemetry (ADOT) Operator
- KubeStateMetrics Addon is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects
- PrometheusNodeExporter Addon enables you to measure various machine resources such as memory, disk and CPU utilization
- Adot Addon to install and manage the AWS Distro for OpenTelemetry (ADOT) Operator
- Amp Adot Addon deploys an AWS Distro for OpenTelemetry (ADOT) Collector for Amazon Managed Service for Prometheus which receives transactional metrics from the application and Prometheus metrics scraped from pods on the cluster and remote writes the metrics to Amazon Managed Service for Prometheusremote write endpoint. This addon creates an Amazon Managed Service for Prometheusworkspace with name demo-amp-wokspace
- Creates a ClusterSecretStore which can be used by all external secrets from all namespaces
- Creates an ExternalSecret which can be used to fetch, transform and inject secret for Amazon Managed Grafana workspace API Key
Once the deployment is complete, you will see the following output in your terminal:
To update your Kubernetes config for you new cluster, copy and run the aws eks update-kubeconfig
command (the second command in the list above) in your terminal.
Validate the access to your EKS cluster using below kubectl
command listing the secret created to access Amazon Managed Grafana workspace:
Let’s also grab the endpoint URL of your created Amazon Managed Service for Prometheus workspace using the below commands:
Installing Grafana Operator
Next, lets install Grafana Operator on Amazon EKS to manage external Grafana instances such as Amazon Managed Grafana. The Grafana-operator will be used to create an Amazon Managed Service for Prometheus data source and dashboards on Amazon Managed Grafana using Kubernetes Custom Resource Definitions in a Kubernetes native way. Please use the below command to perform a Helm installation of Grafana Operator:
Next, run the below command to status of Grafana Operator Helm installation :
Creating Amazon Managed Grafana Datasources and Dashboards using Grafana Operator:
Now, let’s get to the fun part to creating an identity to Amazon Managed Grafana using the Grafana API Key from your Amazon EKS cluster. We will be using grafanas.grafana.integreatly.org
Custom Resource Definition (CRD) for this purpose as shown below:
Lets check if an identity to Amazon Managed Grafana is created fine using below command :
Next, lets create Amazon Managed Service for Prometheus as a data source to Amazon Managed Grafana from your Amazon EKS cluster. We will be using grafanadatasources.grafana.integreatly.org
CRD for this purpose as shown below:
Lets check if Amazon Managed Service for Prometheus is created as a datasource to Amazon Managed Grafana from your Amazon EKS cluster using below commands:
Please check on grafana-operator-manifests if you are looking for samples to add Amazon CloudWatch and AWS X-Ray as a datasource to Amazon Managed Grafana from your Amazon EKS Cluster.
Next lets navigate to Amazon Managed Grafana console and click on Configuration → Data Sources and click on the data source grafana-operator-amp-datasource as shown below:
Next, lets click on Save and Test as shown to make sure the data source is working fine.
Finally lets create a Grafana Dashboard on Amazon Managed Grafana from your Amazon EKS Cluster. We will be using grafanadashboards.grafana.integreatly.org
CRD for this purpose as shown below :
Lets now check if Grafana Dashboard on Amazon Managed Grafana is created from your Amazon EKS Cluster using below command:
Finally lets navigate to Amazon Managed Grafana console, click on Search Dashboards and you will be able to see a Dashboard by name Grafana Operator – Node Exporter/Nodes and click on the same will show you the Grafana Dashboard created out of the box having all the metrics from Prometheus Node Exporter installed on your Amazon EKS Cluster.
GitOps Approach with Grafana Operator
GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecyle of resources in Amazon Managed Grafana.
Cleanup
You continue to incur cost until deleting the infrastructure that you created for this post. Use the commands below to delete resources created during this post:
CDK will prompt you Are you sure you want to delete: grafana-operator-cluster (y/n)?
and enter y
to delete.
Conclusion
In this post, you learned how organizations are leveraging Kubernetes as a control plane to create and manage Grafana implementations to managed services such as Amazon Managed Grafana. Further, we demonstrated on how to use Grafana Operator from your Kubernetes cluster to data sources such as Amazon Managed Service for Prometheus and create Grafana dashboards to external grafana instances such as Amazon Managed Grafana in a Kubernetes native way. We would highly recommend to try this solution and also leverage GitOps mechanisms with Grafana Operator using CNCF projects such as Flux to create and manage the life cycle of resources in Amazon Managed Grafana.
For more information, see the following references:
- Grafana Operator
- Kubernetes as a platform vs. Kubernetes as an API
- Amazon EKS Blueprints
- Amazon EKS Blueprints Patterns
- GitOps model for provisioning and bootstrapping Amazon EKS clusters using Crossplane and Flux