AWS Open Source Blog
Setting up cross-account ingestion into Amazon Managed Service for Prometheus
April 21, 2021: This article has been updated to reflect changes introduced by Sigv4 support on Prometheus server.
The recently launched Amazon Managed Service for Prometheus (AMP) service provides a highly available and secure environment to ingest, query, and store Prometheus metrics. We can query the metrics from the AMP environment using Amazon Managed Grafana, a fully managed service that is developed together with Grafana Labs and based on open source Grafana. We can also query Prometheus metrics from AMP with a self-hosted Grafana server, or using the HTTP APIs.
In previous posts, we have demonstrated how to set up cross-region metrics collection using AMP. However, organizations may have their workloads running globally and spread across multiple AWS accounts. In this article, we show how to set up central monitoring visibility for cross-account applications with AMP.
Scenario
For purposes of this article, we’ll consider a scenario where we have two workloads running into separate accounts (workload account A and B), and we want to enable central visibility (central monitoring account) on Prometheus metrics. Proper AWS Identity and Access Management (IAM) policies will be configured to allow cross-account access from the workloads to the AMP workspace.
Setup
To achieve this, we will set up an AMP workspace in the central monitoring account. We will then create a role inside the monitoring account that trusts our workloads accounts with write permissions on our AMP workspace. On each workload account, we will deploy a Prometheus server into an Amazon Elastic Kubernetes Service (Amazon EKS) cluster to collect metrics.
Leveraging the IAM roles for service accounts feature of Amazon EKS, we will grant IAM permissions to allow assuming a cross-account role in the central account. For one of the workload accounts (account B), we will keep the traffic to AMP completely private by using Amazon Virtual Private Cloud (VPC) endpoint, VPC peering, and Amazon Route 53 private hosted zones.
For the central monitoring account, we’ll do the following:
- Create an AMP workspace in the monitoring account.
- Create an Amazon Managed Grafana workspace in the monitoring account to visualize metrics.
- Create an IAM role with AMP write only permissions and allows to be assumed by workloads accounts (
AmazonPrometheusRemoteWriteAccess
).
For the Private Networking Section, we will:
- Set up Amazon VPC, Internet Gateway (IGW), and Subnets.
- Set up Amazon VPC endpoint.
- Set up DNS private hosted zone and DNS A Record (account B only).
- Create a VPC peering between workload accounts and the central monitoring account (account B only).
For workload accounts, we’ll:
- Create an Amazon VPC, Internet Gateway (IGW), and Subnets.
- Create an Amazon EKS cluster.
- Create an IAM role, which allows “assume role” in the central monitoring account.
- Deploy Prometheus server with
remoteWrite
to AMP. - Set up an Amazon VPC Endpoint for the AMP service (account A only).
- Create a VPC peering between the workload account and the central monitoring account (account B only).
The entire setup can be visualized as follows.
This example will use the Ireland (eu-west-1) region. Please visit the AWS Regional Service List to see AWS regions supported by the service.
Workload accounts (account A and account B)
In this section, on both workload accounts, we will:
- Create an Amazon VPC and create an Amazon EKS cluster.
- Create an IAM role and assume role policy in the central monitoring account.
Many of the scripts provided in this blog post rely on dependencies, such as jq, kubectl, eksctl, helm
, and awscli. To get these tools installed on AWS CloudShell, we’ll use the following commands.
Note that CloudShell sessions are ephemeral and that deployment of the Amazon EKS cluster can take up to 20 minutes. If the sessions expires, you’ll need to install the tools again by simply running this script and sourcing the delete.env
file to restore the environment variables.
This script creates an Amazon VPC, an Amazon EKS cluster, and service account role.
Let’s create a minimum set of permissions to assume the role EKS-AMP-Central-Role
. We will create the EKS-AMP-Central-Role
in the central monitoring account later. You can eventually attach additional permissions according to your use case.
Central monitoring account
Logged into the central account, we will now create an AMP workspace with aws-cli
. We’ll use following command:
Alternatively, we can use the AWS console to create the workspace:
To set up an Amazon Managed Grafana workspace, follow the instructions found in the Amazon Managed Grafana – Getting Started article from the AWS Management & Governance blog.
IAM role (central account)
In this essential step, using IAM trust policies, we are going to define which IAM roles will be able to have write permissions on our central AMP workspace. Here, we specify the roles we created in the steps above.
In a file called policy.json
, add the following content and edit the respective values for WORKLOAD_ACCOUNT_A, WORKLOAD_ACCOUNT_B
, with the AWS account IDs of the workload accounts.
We can now proceed to IAM role creation. We are also giving write access to AMP via a role policy.
Deploying Prometheus server — account A only
We will now run Prometheus server inside the Amazon EKS cluster.
Edit the file named amp_ingest_override_values.yaml
and replace the ACCOUNT_ID_A
, CENTRAL_ACCOUNT_ID
and WORKSPACE_ID
variables, respectively, with the AWS Account ID of the current account, the AWS account ID for the central monitoring account, and finally the AMP workspace ID.
The Prometheus metrics should start to be visible from the central monitoring account. Here we can see the two Amazon EKS worker nodes reporting process metrics:
Up to this point, traffic to the AMP endpoint is routed over public internet (HTTPS+Sigv4). We can verify this by checking the IP address of the DNS name configured for the AMP host server (aps-workspaces.eu-west-1.amazonaws.com
).
This should return a set of the public IP addresses of the AMP service:
We can make this traffic private by adding a VPC endpoint (VPCe) with the following commands:
After a few minutes, the VPCe will be ready to route traffic, and the VPC Private DNS will start to return the local IP for this endpoint. Note that traffic will be disrupted until the DNS resolution propagates and the VPCe gets created and starts receiving traffic.
Private networking setup
To keep things completely private and secure in the second account (account B), we will set up a VPC peering with the monitoring account before running Prometheus. Here’s a summary of the steps involved.
Central account:
- Create an Amazon VPC, Internet Gateway (IGW), and Subnets.
- Set up a VPC endpoint.
- Create an Amazon Route 53 private hosted zone.
Workload account B:
- Request a VPC peering with central account Amazon VPC.
Central account:
- Accept VPC peering.
- Attach hosted zone to account B’s VPC.
Workload account B:
- Attach Central account’s private hosted zone to account B’s Amazon VPC.
- Configure Amazon EKS security groups.
- Create VPC endpoints.
Central account
The following script will create, in the central monitoring account, an Amazon VPC with two Subnets and a VPC endpoint to enable private connectivity between the Amazon VPC and AMP. We will host a DNS private zone with Amazon Route 53 to resolve DNS queries to AMP inside the Amazon VPC and provide permissions for cross-account DNS resolution into this hosted zone for the workload account Amazon VPC.
Workload account B
In account B, set the CENTRAL_ACCOUNT_ID
, and CENTRAL_VPCID
environment variables with the corresponding values and run the following script to request a VPC peering with the central monitoring account.
Central monitoring account
In the central account, we will accept the peering connection and set up a private DNS hosted zone to be associated with account B’s VPC. Set the appropriate value for WORKLOAD_VPCID
and run the following script.
Workload account B
On workload account B, the VPC peering should now appear as active. We will next associate the private hosted zone for AMP to the Amazon EKS VPC. From the last script output, set the corresponding value for HOSTED_ZONE and run the following script.
Deploying Prometheus server — account B only
With the full private connectivity between VPCs established, we can now run Prometheus server on Amazon EKS. This is similar to the deployment made with account A.
Edit the file named amp_ingest_override_values.yaml
and replace the WORKLOAD_ACCOUNT_ID
, CENTRAL_ACCOUNT_ID
, and WORKSPACE_ID
variables, respectively, with the AWS Account ID of the current account, the AWS account ID for the central monitoring account, and the AMP workspace ID.
At this stage, we’re able to visualize all metrics from the four worker nodes in both Amazon EKS clusters, with two per cluster in each workload account.
To test the route used by Prometheus server to publish its metrics, we can query the DNS record for the the AMP host with the following command:
You should see a result like this:
Note that the address of the host is within the central monitoring account CIDR block.
Troubleshooting
To troubleshoot issues in the deployment, check the logs in the prometheus-for-amp-server
pod with this command:
In this article, we have shown how to centralize Prometheus metrics collection using AMP for a workload segmented into multiple accounts. To visualize metrics, we’ve set up Grafana workspace with Amazon Managed Grafana, which provides a native integration with AMP. You can also run your own Grafana server and query your metrics.
With the help of IAM roles and cross-account trust policies, you can be specific regarding who has access to the workspace. Our example makes use of Amazon Elastic Kubernetes Service (EKS); however, you can also use this setup for other workload types, such as Amazon Elastic Container Service (ECS) or Amazon Elastic Compute Cloud (Amazon EC2). Additionally, we provide options for complete private connectivity using VPC peering and VPC endpoints.
Cleanup
To remove all the resources used in this article, run the following commands on each account. All the relevant resources are saved in the “delete.env” file for referencing on these scripts.
Workloads:
Central account: