AWS Open Source Blog

Set up cross-region metrics collection for Amazon Managed Service for Prometheus workspaces

Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring service for container infrastructure and application metrics that makes it easy for customers to securely monitor container environments at scale.

In a previous getting started blog post, we showed how to set up an AMP workspace and ingest metrics from an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. AWS customers use more than one AWS Region in their architecture for a variety of reasons, and it is normal for customers to collect metrics from different AWS Regions and ingest them into one AMP workspace. In this article, we will show how to set up this architecture.

Architecture design

Architecture diagram for use cases where customers use more than one AWS Region.

Setup instructions

We are using many of the steps mentioned in the Getting Started with Amazon Managed Service for Prometheus article; refer to it when necessary.

We use three different AWS Regions in our example setup. We use AWS Region US-EAST-1 as Region X (where we create an Amazon EKS cluster), US-WEST-2 as Region Y (where we create an AMP workspace), and EU-WEST-1 as Region Z (where we create an Amazon Managed Grafana workspace).

Steps involved in the setup:

  • Create AMP workspace in Region Y.
  • Set up an Amazon Virtual Private Cloud (Amazon VPC) endpoint on Region Y.
  • Create an Amazon EKS cluster in Region X.
  • Set up an Amazon VPC peering connection between VPCs on Region X and Region Y.
  • Configure Amazon Route 53 to resolve requests to AMP workspace to be routed through the VPC endpoint.
  • Deploy Prometheus server on the Amazon EKS cluster and configure remote write to AMP ingestion endpoint.
  • Create Amazon Managed Grafana workspace in Region Z and query metrics from AMP workspace in Region Y.

Create an AMP workspace in Region Y

We can use the following commands to create an AMP workspace:

aws amp create-workspace --alias my-xregion-prom --region us-west-2

Then, wait for a few seconds and execute the following command to check the status of the workspace created:

aws amp list-workspaces --region us-west-2

You should get output similar to the one below. Ensure that the status is ACTIVE, indicating that the workspace was created successfully.

{
    "workspaces": [
        {
            "alias": "my-xregion-prom",
            "arn": "arn:aws:aps:us-west-2:1234567890:workspace/ws-9876ww00-xx87-4f26-94c7-94237e12a4e9",
            "createdAt": "2021-01-12T13:01:18.309000-06:00",
            "status": {
                "statusCode": "ACTIVE"
            },
            "workspaceId": "ws-9876ww00-xx87-4f26-94c7-94237e12a4e9"
        }
    ]
}

Alternatively, you can create a workspace using the AWS console by simply providing the workspace name and selecting Create as shown in the following image.

Amazon Managed Service for Prometheus homepage.

Set up a VPC endpoint on Region Y

  • Go to Region Y and navigate to the VPC endpoint page. Then choose Create Endpoint.
  • Select AWS services in the Create Endpoint screen.
  • Fill in the Service Name text box with com.amazonaws<Region Y>aps-workspaces, and select the resulting service as shown in the following screenshot.

Screenshot of page where you create endpoints.

  • Select a VPC that you want to use for this purpose, select the subnets and the default security group, and choose Create endpoint.
  • Now, we have a VPC endpoint created that we can use to make calls to the AMP service from the VPC.

Create an EKS cluster in Region X

Now we create an Amazon EKS cluster in Region X. The easiest way to create a cluster on EKS is to use eksctl. Once you have eksctl installed on your local machine, you can execute the following command to create the cluster:

eksctl create cluster my-xregion-eks --region us-east-1

Once the cluster is ready, we deploy the Prometheus server on the cluster. Before that, however, we need to set up the required permissions so that the Prometheus server can write into an AMP workspace.

The following shell script can be used to execute these actions on the my-xregion-eks Amazon EKS cluster:

  1. Create an AWS Identity and Access Management (IAM) role with an IAM policy that has permissions to remote-write into an AMP workspace.
  2. Create a Kubernetes service account that is annotated with the IAM role.
  3. Create a trust relationship between the IAM role and the OpenID Connect (OIDC) provider hosted in your Amazon EKS cluster.
##!/bin/bash
CLUSTER_NAME=my-xregion-eks
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
OIDC_PROVIDER=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
PROM_SERVICE_ACCOUNT_NAMESPACE=prometheus
GRAFANA_SERVICE_ACCOUNT_NAMESPACE=grafana
SERVICE_ACCOUNT_NAME=iamproxy-service-account
SERVICE_ACCOUNT_IAM_ROLE=EKS-AMP-ServiceAccount-Role
SERVICE_ACCOUNT_IAM_ROLE_DESCRIPTION="IAM role to be used by a K8s service account with write access to AMP"
SERVICE_ACCOUNT_IAM_POLICY=AWSManagedPrometheusWriteAccessPolicy
SERVICE_ACCOUNT_IAM_POLICY_ARN=arn:aws:iam::$AWS_ACCOUNT_ID:policy/$SERVICE_ACCOUNT_IAM_POLICY
#
# Setup a trust policy designed for a specific combination of K8s service account and namespace to sign in from a Kubernetes cluster which hosts the OIDC Idp.
# If the IAM role already exists, then add this new trust policy to the existing trust policy
#
echo "Creating a new trust policy"
read -r -d '' NEW_TRUST_RELATIONSHIP <<EOF
 [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${GRAFANA_SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${PROM_SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"
        }
      }
    }
  ]
EOF
#
# Get the old trust policy, if one exists, and append it to the new trust policy
#
OLD_TRUST_RELATIONSHIP=$(aws iam get-role --role-name $SERVICE_ACCOUNT_IAM_ROLE --query 'Role.AssumeRolePolicyDocument.Statement[]' --output json)
COMBINED_TRUST_RELATIONSHIP=$(echo $OLD_TRUST_RELATIONSHIP $NEW_TRUST_RELATIONSHIP | jq -s add)
echo "Appending to the existing trust policy"
read -r -d '' TRUST_POLICY <<EOF
{
  "Version": "2012-10-17",
  "Statement": ${COMBINED_TRUST_RELATIONSHIP}
}
EOF
echo "${TRUST_POLICY}" > TrustPolicy.json
#
# Setup the permission policy grants write permissions for all AWS StealFire workspaces
#
read -r -d '' PERMISSION_POLICY <<EOF
{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "aps:RemoteWrite",
            "aps:QueryMetrics",
            "aps:GetSeries",
            "aps:GetLabels",
            "aps:GetMetricMetadata"
         ],
         "Resource":"*"
      }
   ]
}
EOF
echo "${PERMISSION_POLICY}" > PermissionPolicy.json
#
# Create an IAM permission policy to be associated with the role, if the policy does not already exist
#
SERVICE_ACCOUNT_IAM_POLICY_ID=$(aws iam get-policy --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN --query 'Policy.PolicyId' --output text)
if [ "$SERVICE_ACCOUNT_IAM_POLICY_ID" = "" ]; 
then
  echo "Creating a new permission policy $SERVICE_ACCOUNT_IAM_POLICY"
  aws iam create-policy --policy-name $SERVICE_ACCOUNT_IAM_POLICY --policy-document file://PermissionPolicy.json 
else
  echo "Permission policy $SERVICE_ACCOUNT_IAM_POLICY already exists"
fi
#
# If the IAM role already exists, then just update the trust policy.
# Otherwise create one using the trust policy and permission policy
#
SERVICE_ACCOUNT_IAM_ROLE_ARN=$(aws iam get-role --role-name $SERVICE_ACCOUNT_IAM_ROLE --query 'Role.Arn' --output text)
if [ "$SERVICE_ACCOUNT_IAM_ROLE_ARN" = "" ]; 
then
  echo "$SERVICE_ACCOUNT_IAM_ROLE role does not exist. Creating a new role with a trust and permission policy"
  #
  # Create an IAM role for Kubernetes service account 
  #
  SERVICE_ACCOUNT_IAM_ROLE_ARN=$(aws iam create-role \
  --role-name $SERVICE_ACCOUNT_IAM_ROLE \
  --assume-role-policy-document file://TrustPolicy.json \
  --description "$SERVICE_ACCOUNT_IAM_ROLE_DESCRIPTION" \
  --query "Role.Arn" --output text)
  #
  # Attach the trust and permission policies to the role
  #
  aws iam attach-role-policy --role-name $SERVICE_ACCOUNT_IAM_ROLE --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN  
else
  echo "$SERVICE_ACCOUNT_IAM_ROLE_ARN role already exists. Updating the trust policy"
  #
  # Update the IAM role for Kubernetes service account with a with the new trust policy
  #
  aws iam update-assume-role-policy --role-name $SERVICE_ACCOUNT_IAM_ROLE --policy-document file://TrustPolicy.json
fi
echo $SERVICE_ACCOUNT_IAM_ROLE_ARN
# EKS cluster hosts an OIDC provider with a public discovery endpoint.
# Associate this Idp with AWS IAM so that the latter can validate and accept the OIDC tokens issued by Kubernetes to service accounts.
# Doing this with eksctl is the easier and best approach.
#
eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --approve

Set up a VPC Peering Connection between VPCs on Region X and Region Y

We need to set up a VPC peering connection between the two VPCs across regions so that calls to the VPC endpoint from Region X can reach Region Y.

    • Navigate to the Create Peering Connection screen on the VPC console on Region X (the requester).
    • In the VPC requester drop-down, select the VPC of the EKS cluster created earlier.
    • Under Select another VPC to peer with section, select My Account, select Another Region, and then select Region Y in the drop-down menu.
    • In the VPC ID(Acceptor) text box, enter the VPC ID of the VPC in Region Y.
    • Your resulting screen should look similar to the following screenshot:

Screenshot of the page titled "Create Peering Connection".

  • Now choose Create Peering Connection.
  • Your peering connection will now go to Pending Acceptance status. This is because, although the request VPC has made the request to connect to another VPC, the connection only gets created if the VPC on the other end accepts the connection request.
  • Now, navigate to the VPC Peering Connection screen on Region Y and select the Peering request that is in Pending Acceptance status and accept using the Actions drop-down. This will change the status to Active.

Configure route table on the VPC to connect to the peering connection

  • Go to the VPC console on Region X (where your EKS cluster is) and select the Public Route Table that is associated to the VPC.
  • Under the Routes tab, choose Edit routes.
  • Enter the Region Y VPC CIDR range in the Destination text box and select the newly created peering connection as the Target.
  • Choose Save routes. The configuration should look similar to the screenshot:

Screenshot of results from selecting "Save routes".

Configure route table on the receiving VPC (on Region Y) to connect to the peering connection

  • Go to the VPC console on Region Y and select the Public Route Table that is associated to the VPC.
  • Under the Routes tab, choose Edit routes.
  • Enter the Region X VPC CIDR range in the Destination text box and select the newly created peering connection as the Target.
  • Choose Save routes. The configuration should look similar to the following screenshot:

Screenshot of the configuration once the routes are edited.

Set up the security group in Region Y to allow requests from resources in the VPC in Region X

To allow the traffic from Region X to be accepted into Region Y, add the VPC CIDR range of the EKS cluster in Region X. Once added, your security group Inbound rules should look like the following screenshot:

Screenshot of the security group Inbound rules.

Configure Route 53 to resolve requests to AMP workspace to be routed through the VPC endpoint

  • Go to the Route53 console and choose Create hosted zone.
  • In the domain name field, enter the information for the domain name that you want to route traffic for.
  • Select Private hosted zone.

Screenshot of the page to create a Private hosted zone.

  • Choose Create hosted zone.
  • Now we need to create an A record to route the traffic to the VPC endpoint created earlier.
  • Inside the newly created hosted zone, choose Create record.
  • In the Quick create record screen, choose Switch to wizard.
  • In the Choose routing policy screen, select Simple routing and choose Next.
  • In the Configure records screen, select Define simple record.
  • In the new screen, leave the Record name field as it is.
  • Select Alias to VPC endpoint in the Value/Route traffic to drop-down.
  • Select Region Y where you created the VPC endpoint earlier.
  • Now, select the first VPC Endpoint alias from the lookup that appears.
  • Leave the Record type drop-down as it is and select Define simple record.
  • Once created, your Hosted zone should look like the following screenshot:

Screenshot of the example Hosted zone.

Deploy Prometheus server

We will be using Helm to install the Prometheus server on the cluster. The following commands will add the helm repo, create a new namespace called prometheus, and deploy Prometheus using the Helm chart prometheus-community/prometheus.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 
kubectl create ns prometheus 
helm install prometheus-for-amp prometheus-community/prometheus -n prometheus

Next, we create a file called amp_ingest_override_values.yaml with the following content in it. Replace the placeholder ${AWS_REGION_Y} with the actual AWS region name where your AMP workspace is. Also replace the ${WORKSPACE_ID} with the AMP workspace ID.

  sidecarContainers:
    aws-sigv4-proxy-sidecar:
        image: public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0
        args:
        - --name
        - aps
        - --region
        - ${AWS_REGION_Y}
        - --host
        - aps-workspaces.${AWS_REGION_Y}.amazonaws.com
        - --port
        - :8005
        ports:
        - name: aws-sigv4-proxy
          containerPort: 8005
  statefulSet:
      enabled: "true"
  remoteWrite:
      - url: http://localhost:8005/workspaces/${WORKSPACE_ID}/api/v1/remote_write

Execute the following command to modify the Prometheus server configuration to deploy the signing proxy and configure the remoteWrite endpoint:

helm upgrade --install prometheus-for-amp prometheus-community/prometheus -n prometheus -f ./amp_ingest_override_values.yaml

Create Amazon Managed Grafana workspace in Region Z and query metrics from AMP workspace in Region Y

  • Set up an Amazon Managed Grafana workspace by following the instructions from the blog post Amazon Managed Grafana – Getting Started from the AWS Management & Governance Blog.
  • Once you’re logged into the Amazon Managed Grafana console, add the AMP datasource by selecting AWS services under the AWS section on the left navigation bar.
  • Select Prometheus under the AWS services tab.
  • In the Data sources tab, select your AWS Region (Region Y) where the AMP workspace is.
  • The AMP workspace will automatically appear under the drop-down. Select the check box and choose Add 1 data source to add the AMP data source.

Screenshot of the data sources.

  • Now choose Explore from the left navigation bar and enter the following query in to the text box: apiserver_current_inflight_requests
  • You will see a screen similar to the one in the following screenshot, which shows that we are able to successfully query metrics from the EKS cluster through the AMP workspace:

Screenshot of the output when you run the query.

Conclusion

In this article, we walked through the steps to securely ingest Prometheus metrics into an Amazon Managed Service for Prometheus workspace from an Amazon EKS cluster and also query the metrics from an Amazon Managed Grafana workspace, all deployed on different AWS Regions.

Although we used the Prometheus server to ingest metrics into AMP, we can alternatively use the newly launched lightweight Grafana Cloud Agent for this purpose. Check out the GitHub repo for further details. We can use the AWS Distro for Open Telemetry Remote Write Exporter to send application metrics to AMP as well. Learn more about the topic in the documentation.

References

Imaya Kumar Jagannathan

Imaya Kumar Jagannathan

Imaya is a Senior Solution Architect focused on AWS Observability tools including Amazon CloudWatch, AWS X-Ray, Amazon Managed Service for Prometheus, Amazon Managed Service for Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming on C#, working with containers and serverless technologies. Find him on Twitter @imaya.