AWS Cloud Operations & Migrations Blog

Migrating to Amazon Managed Service for Prometheus with the Prometheus Operator

The Prometheus Operator allows cluster administrators to manage Prometheus clusters running in Kubernetes. It makes it easy to deploy and manage Prometheus via native Kubernetes components. In this blog post, I will demonstrate how you can deploy Prometheus via the Prometheus Operator, and how you can easily migrate your monitoring workloads to take advantage of using Amazon Managed Service for Prometheus. You can continue use the toolset you’re familiar with to manage your workload while offloading the burden of managing your observability stack.

Amazon Managed Service for Prometheus is a serverless, Prometheus-compatible monitoring service for container metrics that makes it easier to securely monitor container environments at scale. With Amazon Managed Service for Prometheus, you can use the same open-source Prometheus data model and query language that you use today to monitor the performance of your containerized workloads, and also enjoy improved scalability, availability, and security without having to manage the underlying infrastructure.

Prerequisites

For this blog post you will need following components:

For this example, I’ve set up an Amazon EKS cluster and updated kubeconfig to call kubectl on my cluster. This cluster will connect to AWS resources (like Amazon Managed Service for Prometheus), so I’ve created an IAM OIDC provider for the cluster so that the cluster can use AWS IAM roles for service accounts.

1. Installing the Prometheus Operator

The Prometheus Operator works by way of Custom Resource Definitions (CRDs). These CRDs extends the Kubernetes API to create and manage applications running in Kubernetes. When you invoke the Prometheus Operator, it examines the request and adjusts the Kubernetes cluster to match the desired state. The Prometheus Operator includes CRDs for Prometheus, Alertmanager, and a number of other Prometheus-related resources.

For this example I’m using the Getting started guide to install the Prometheus Operator.

After following the guide, I have a basic workload that includes Prometheus scraping a basic instrumented application. The following configuration shows the CRD I am using for Prometheus:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  podMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false

With Prometheus running, I can view the Prometheus UI by running

kubectl port-forward svc/prometheus-operated 9090:9090

The Prometheus web UI is visible in a browser at localhost:9090. After a few minutes, I can see that metrics are being gathered. Additionally, I can create Alertmanager instances and set up monitoring rules to begin monitoring the workload. This getting started guide for alerting walks through how to configure the CRDs for Alertmanager.

2. Updating the workload to use Amazon Managed Service for Prometheus

First, set up an Amazon Managed Service for Prometheus workspace. In this post, I use the AWS Controllers for Kubernetes (ACK) for Amazon Managed Service for Prometheus. The ACK controller lets you create native AWS objects using custom resource definitions (CRDs) within the Kubernetes environment. You can also set up a workspace manually using the AWS Command Line Interface (CLI) or the console.

I use the following commands to install the ACK controller for Amazon Managed Service for Prometheus to the Amazon EKS cluster, where REGION is the region of the workload.

export SERVICE=prometheusservice
export RELEASE_VERSION=$(curl -sL https://api.github.com/repos/aws-controllers-k8s/${SERVICE}-controller/releases/latest | jq -r '.tag_name | ltrimstr("v")')

export ACK_SYSTEM_NAMESPACE=ack-system
export AWS_REGION=REGION

aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws
helm install --create-namespace -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller \
  oci://public.ecr.aws/aws-controllers-k8s/$SERVICE-chart --version=$RELEASE_VERSION --set=aws.region=$AWS_REGION

After configuring my environment to use the ACK controller, I create a new Amazon Managed Service for Prometheus workspace using following configuration.

apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: Workspace
metadata:
  name: prometheus-workspace
spec:
  alias: prometheus-workspace
  tags:
    ClusterName: prom-operator-demo

I create the workspace via kubectl. I then run the following kubectl command to retrieve the Workspace ID, which will be used later

kubectl describe workspace prometheus-workspace

I set up a service role to ingest the metrics from my Amazon EKS cluster into the workspace. The IAM role has aps:RemoteWrite, aps:GetSeries, aps:GetLabels, and aps:getMetricMetadata permission on the workspace. The role must have an appropriate trust relationship so the EKS cluster can assume the role. In my case, the role is named amp-iamproxy-ingest-role.

3. Configuring the workspace remote write endpoint

To use Amazon Managed Service for Prometheus via the Prometheus Operator, I update the Prometheus CRD by adding a remoteWrite configuration as follows:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT-ID:role/amp-iamproxy-ingest-role"
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false
  remoteWrite:
    - url: "https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE-ID/api/v1/remote_write"
      sigv4:
        region: "REGION"
      queueConfig:
        capacity: 2500
        maxShards: 200
        maxSamplesPerSend: 1000

Where ACCOUNT-ID is the AWS account ID, WORKSPACE-ID is the Workspace ID of the workspace you created, and REGION is the AWS Region where the Amazon EKS cluster was created.

To apply changes run the following command:

kubectl apply -f prometheus.yml

Again, I access the Prometheus web UI by running following command:

kubectl port-forward svc/prometheus-operated 9090:9090

4. Testing the configuration

The Prometheus web UI is visible in a browser at localhost:9090. I can see that a remote_write URL has been added to the server configuration. See Figure 1.

The Configuration page of the Prometheus UI shows a remote_write URL, which allows Prometheus to remote write metric data to Amazon Managed Service for Prometheus
Figure 1: Prometheus configuration has been updated with a remote_write URL.

I use awscurl to query the Prometheus workspace to verify ingested data

awscurl --service "aps" --region "REGION" "WORKSPACE_QUERY_URL?query=http_requests_total"

Where REGION is the region of the workspace and WORKSPACE_QUERY_URL is the query URL endpoint of the workspace. The WORKSPACE_QUERY_URL can be viewed on the Amazon Managed Service for Prometheus console for the workspace, or the URL can be created as follows:

https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE-ID/api/v1/query

Where REGION is the region of the workspace and WORKSPACE-ID is the workspace id of the workspace.

5. Configuring Alertmanager

Because Alertmanager and Prometheus rule functionality is built-in to Amazon Managed Service for Prometheus, any existing Alertmanager CRDs are no longer needed. Those CRDs can be removed from your cluster.

The ACK controller has a concept of a RuleGroupNamespace (which is equivalent to a PrometheusRule in the Prometheus Operator), and an AlertManagerDefinition (which is equivalent to an AlertmanagerConfig in the Prometheus Operator).

I can use RuleGroupNamespace to create a new alerting rule. Replace the WORKSPACE-ID with the Workspace ID of the workspace in following configuration.

apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: RuleGroupsNamespace
metadata:
  name: default-rule
spec:
  workspaceID: WORKSPACE-ID
  name: default-rule
  configuration: |
    groups:
    - name: example
      rules:
      - alert: HighRequestLatency
        expr: (rate(http_request_duration_microseconds{handler="api"}[2m])/1000000) > 2        
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Host latency detected      

I apply this to my cluster via kubectl. After a few minutes, the rules will appear under the Rules management tab of the workspace. See figure 2.

The Rules management page of the Prometheus workspace has been updated with a default-rule that is in an Active state
Figure 2: Prometheus has been updated with a monitoring rule.

I can use an AlertManagerDefinition to send an alerts to an Amazon Simple Notification Service (Amazon SNS) topic.  Replace WORKSPACE-ID  with the Workspace ID, SNS-TOPIC-ARN with the ARN of an Amazon SNS topic where you want to send the alerts, and REGION with the current region of the workload. Make sure that your workspace has permissions to send messages to Amazon SNS.

apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: AlertManagerDefinition
metadata:
  name: alert-manager
spec:
  workspaceID: WORKSPACE-ID
  configuration: |
    alertmanager_config: |
      route:
         receiver: default_receiver
      receivers:
        - name: default_receiver
          sns_configs:
          - topic_arn: SNS-TOPIC-ARN
            sigv4:
              region: REGION
            message: |
              alert_type: {{ .CommonLabels.alertname }}
              event_type: {{ .CommonLabels.event_type }}

After a few minutes, the Alert manager configuration will appear under the Alert manager tab of your workspace. See figure 3.

The Prometheus alert manager tab shows a new alertmanager configuration, which sends alerts to an Amazon SNS topic
Figure 3: The workspace Alert manager configuration has been updated with an alert manager configuration.

Once you have configured rules and alerts within your workspace, you can delete the Alertmanager, AlertmanagerConfig, and PrometheusRule Prometheus Operator CRDs from your cluster, as they are no longer needed.

Next Steps

In this blog post I demonstrated the basics of the Prometheus Operator and I demonstrated how you can use this operator to begin to take advantage of using Amazon Managed Service for Prometheus, including Alertmanager. Using the steps I demonstrate in this blog, you can continue to use the management tools you’re familiar with for managing your workload while offloading the burden of managing your observability stack by migrating to Amazon Managed Service for Prometheus.

Alertmanager and rule management are built in to Amazon Managed Service for Prometheus. If you’re using the Prometheus Operator to manage Alertmanager or rules, you can simply delete those resources from your configuration and begin using the ACK controller for Amazon Managed Service for Prometheus. Like the Prometheus Operator, the ACK controller lets you take advantage of managing a Prometheus workspace by using CRDs, but it is optimized to create resources within an AWS account.

As a next step, take advantage of using an Amazon Managed Service for Prometheus workspace by configuring a remoteWrite URL as part of your Prometheus Operator configuration. You can also manage Alertmanager and rules by installing an ACK controller on your Amazon EKS cluster.

About the author:

Mike George

Mike George is a Principal Solutions Architect based out of Salt Lake City, Utah. He enjoys helping customers solve their technology problems. His interests include software engineering, security, artificial intelligence (AI), and machine learning (ML).