AWS Management & Governance Blog

Collect on-premises metrics using Amazon Managed Service for Prometheus

Prometheus is a popular open-source metrics monitoring solution that is widely used in a variety of workloads. Although it’s common for customers to use Prometheus to monitor container workloads, it’s also used to monitor Amazon Elastic Compute Cloud (Amazon EC2) instances and virtual machines (VMs) and servers in on-premises environments.

Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring service for infrastructure and application metrics that makes it easy for customers to securely monitor their workloads at scale. Customers using Prometheus in self-hosted environments face challenges in managing a highly available, scalable, and secure Prometheus server environment, infrastructure for long-term storage, and access control. AMP solves these problems by providing a fully managed environment that is tightly integrated with AWS Identity and Access Management (IAM) to control authentication and authorization.

To start using AMP, complete these two simple steps:

  • Create an AMP workspace.
  • Configure your Prometheus server to remote write into the AMP workspace.

To remote write into your workspace, you need an IAM role with IAM permissions and policies. This poses a challenge for on-premises environments where IAM roles aren’t available to the instance. A common solution to this problem is to use programmatic access keys that are essentially long-term credentials stored in a secure location and retrieved by the application during startup. This approach makes it difficult to comply with best practices like the rotation of the credentials.

A better approach is the use of temporary credentials using AWS Security Token Service (AWS STS), but this requires the use of identity federation (SAML, OIDC, and so on) and changes in the remote write part of Prometheus.

AWS Systems Manager (formerly known as SSM) uses AWS STS in a secure way. You can use it to allow remote_write access to your AMP workspace without rewriting any code.

You can use Systems Manager to manage your infrastructure on AWS and your on-premises resources. You can use the Systems Manager console to view operational data from AWS services and automate operational tasks across your AWS resources. Systems Manager helps you maintain security and compliance by scanning your managed instances and reporting on (or taking corrective action on) any policy violations it detects.

A managed instance is a machine configured for use with Systems Manager. Supported machine types include EC2 instances, on-premises servers, and VMs, including VMs in other cloud environments. Supported operating system types include Windows Server, macOS, Raspbian, and multiple distributions of Linux.

When Systems Manager is configured to manage hybrid environments, the SSM Agent is deployed to those instances and an IAM role must be created for them. The SSM Agent will go over an activation process using TLS and Amazon or private certificates using AWS Certificate Manager (ACM). Most modern operating systems (Windows and Linux) already include Amazon certificates. (Only one certificate is required.) For information about installing a certificate manually, see Install a TLS certificate on on-premises servers and VMs.

What about Prometheus?

During the registration process of the SSM Agent, a credential file is created in the home path of the user running the SSM Agent (by default, root). The SSM Agent will keep this file updated by requesting temporary credentials through AWS STS. It assigns a role to the instance that you specified during the activation process. The same credentials can be used for remote write operations in your AMP cluster by configuring the required permissions.

Shows interaction between Prometheus server, AMP, Systems Manager, SSM Agent, and AWS STS in an on-premises environment.

Figure 1: Solution architecture

Configure the SSM Agent

The SSM Agent is an OpenSource project. You can access the public repository on GitHub. In this blog post, I’ll follow the steps in Setting up AWS Systems Manager for hybrid environments. I assume that Systems Manager is already configured in your environment, as described in Step 1: Complete general Systems Manager setup steps.

Create an AMP workspace

The following script will create an AMP workspace in the us-east-1 Region. If you prefer, you can change the WORKLOAD_REGION variable to use another AWS Region where AMP is supported.

WORKLOAD_REGION='us-east-1'

WORKSPACE_ID=$(aws amp create-workspace --alias onpremises-demo-workspace \
  --region $WORKLOAD_REGION \
  --output text \
  --query 'workspaceId')
  
WORKSPACE_URL=$(aws amp describe-workspace --region $WORKLOAD_REGION --workspace-id $WORKSPACE_ID --query workspace.prometheusEndpoint --output text)


echo "This is the URL for remote_write configuration you must copy to your VM:\n $WORKSPACE_URL" 
echo "export WORKSPACE_ID=$WORKSPACE_ID" >> delete.env

Create an IAM service role for the hybrid environment

Use the following commands to create an IAM role with a policy that allows Systems Manager to assume the role on behalf of your VM.

cat > SSMService-Trust.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [    
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ssm.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
aws iam create-role \
    --role-name SSMServiceRoleRemoteWrite \
    --assume-role-policy-document file://SSMService-Trust.json 
    

Assign some policies to this empty role. The first policy, AmazonSSMManagedInstanceCore, is needed for basic operations performed by the SSM Agent. The second policy, AmazonPrometheusRemoteWriteAccess, allows the role to perform remote write operations into the AMP workspace you created earlier.

aws iam attach-role-policy \
    --role-name SSMServiceRoleRemoteWrite \
    --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore  
    
aws iam attach-role-policy \
    --role-name SSMServiceRoleRemoteWrite \
    --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess  

Create a managed instance activation for a hybrid environment

To set up servers and VMs in your hybrid environment as managed instances, you need to create a managed instance activation. After you successfully complete the activation, you immediately receive an activation code and activation ID. You specify this code and ID combination when you install the SSM Agent on servers and VMs in your hybrid environment. The code and ID provide secure access to Systems Manager from your managed instances. For more information, see Setting up AWS Systems Manager for hybrid environments.

This credential pair is used to register the VM in Systems Manager. It will not be preserved or used to communicate with the service. After the instance is registered, the SSM Agent will generate an asymmetric key pair and use it to obtain the temporary credentials required to function properly. This pair is uniquely tied to this machine. You can remove the registration from Systems Manager at any time, which makes it a better option than long-term credentials.

I won’t dive deep into the options for creating this activation. You should enforce sensitive values in this command like the number of instances that can be registered with this combination of code and ID (in this case, one), the expiration date of the activation window (the time this pair can be used to activate new servers), and proper tagging.

EXPIRATION=$(date -u -v +1H +%Y-%m-%dT%H:%M:%S)
aws ssm create-activation \
    --default-instance-name OnPremisesServer \
    --iam-role SSMServiceRoleRemoteWrite \
    --registration-limit 1 \
    --region $WORKLOAD_REGION \
    --expiration-date $EXPIRATION \
    --tags "Key=Department,Value=Accounting" "Key=DataClassification,Value=Restricted"

Make a note of the activation ID and code. You’ll need them in the next step.

Install the SSM Agent in a hybrid environment

Execute the rest of the commands in this post in the on-premises VM.

I’m using an Ubuntu 20.04 instance running on VirtualBox. The steps to install and configure this instance are beyond the scope of this post. I installed the instance with the minimum requirements and updated it before starting. For instructions for Linux, see Install SSM Agent for a hybrid environment (Linux). For instructions for Windows, see Install SSM Agent for a hybrid environment (Windows).

On the VM, install the SSM Agent using the prebuilt Debian package:

mkdir /tmp/ssm
curl https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/debian_amd64/amazon-ssm-agent.deb -o /tmp/ssm/amazon-ssm-agent.deb
sudo dpkg -i /tmp/ssm/amazon-ssm-agent.deb
sudo service amazon-ssm-agent stop

Next, register the SSM Agent with your account using the activation ID and code:

sudo -E amazon-ssm-agent -register -code “activation-code” -id “activation-id” -region “region” 
sudo service amazon-ssm-agent start

If the process is successful, you’ll see a message like the following that includes the managed instance ID:

2021-05-17 15:24:49 WARN Could not read InstanceFingerprint file: InstanceFingerprint does not exist.
2021-05-17 15:24:49 INFO No initial fingerprint detected, generating fingerprint file…
2021-05-17 15:24:50 INFO Successfully registered the instance with AWS SSM using Managed instance-id: mi-12345678901234567

To confirm that the instance is reporting properly, in the Systems Manager console, choose Fleet Manager. The instance should be displayed and the SSM Agent status should be Online. After a few seconds, the information about the instance should be populated along with the tags passed to the activation request.

The details page for the instance includes instance ID, OS name (in this example, Ubuntu), Availability Zone, platform type (Linux), SSM Agent version, SSM Agent ping status (Online), and more.

Figure 2: Instance overview in Fleet Manager

The SSM Agent will manage the credentials in the root folder of the user that executed the agent (by default, root).

To check if the file exists and is not empty:

rpereyra@onpremises:~$ sudo ls -al /root/.aws/
total 12
drw------- 2 root root 4096 May 17 15:34 .
drwx------ 5 root root 4096 May 17 15:25 ..
-rw-r--r-- 1 root root 1158 May 17 15:34 credentials

Install a Prometheus server

Now that the SSM Agent is providing a set of credentials to the root user on the instance, you can install the Prometheus server to start exporting metrics to your AMP workspace. Use the following commands to download Prometheus into a new folder:

mkdir /tmp/prometheus
curl -L https://github.com/prometheus/prometheus/releases/download/v2.27.0/prometheus-2.27.0.linux-amd64.tar.gz -o /tmp/prometheus/prometheus-2.27.0.linux-amd64.tar.gz
cd /tmp/prometheus
tar -xvzf prometheus-2.27.0.linux-amd64.tar.gz
cd prometheus-2.27.0.linux-amd64

Now, configure Prometheus to send metrics (remote_write) to your AMP workspace and then start Prometheus.

cp prometheus.yml prometheus.yml.bak
WORKSPACE_URL=<paste value here>
WORKLOAD_REGION=<AMP workspace region>
cat >> prometheus.yml << EOF

remote_write:
  - url: ${WORKSPACE_URL}api/v1/remote_write
    sigv4:
      region: $WORKLOAD_REGION
EOF
chmod +x prometheus
sudo ./prometheus

Note: In this sample I’m running, Prometheus is in foreground and from a temporary folder. This won’t be practical for most scenarios. You will likely have to run Prometheus as a system service. In those cases, be aware that Prometheus must share the same credential file. The AWS SDK will look for it in the home folder of the user ($HOME/.aws/credentials). For simplicity, I’m running both processes as root user. Depending on your OS, you might have to take precautions to avoid sharing the same user and apply least privileges permissions.

After the Prometheus server is up and running, the metrics will be sent to the AMP remote_write destination. You can visualize the metrics by installing Grafana on your local environment or by creating an Amazon Managed Grafana workspace.

The following figure shows how to visualize metrics by querying AMP through an AMG workspace.

Grafana dashboard shows the go_gc_duration_seconds_count metric from the AMP workspace.

Figure 3: go_gc_duration_seconds_count

Use Grafana Agent instead of Prometheus server

The Grafana Cloud Agent is a open-source, lightweight alternative to running a full Prometheus server. It keeps the parts required for discovering and scraping Prometheus exporters and sending metrics to the backend (in this case, AMP), removing subsystems such as the storage, query, and alerting engines.

In this section, I’ll show you how you can deploy the Grafana Cloud Agent to collect metrics as an alternative to the Prometheus server. If the Prometheus server is still running, press Control - C to close the session in the console, and then execute the following commands to install the Grafana Cloud Agent:

sudo apt install unzip
mkdir /tmp/grafana-agent
cd /tmp/grafana-agent
curl -O -L "https://github.com/grafana/agent/releases/download/v0.14.0-rc.4/agent-linux-amd64.zip" 
unzip "agent-linux-amd64.zip"
chmod a+x "agent-linux-amd64"

cat >> agent.yml << EOF
server:
  log_level: info
  http_listen_port: 9090
prometheus:
  wal_directory: /var/lib/grafana-agent
  global:
    scrape_interval: 15s
integrations:
  agent:
    enabled: true
  node_exporter:
    enabled: true

  prometheus_remote_write:
   - url: ${WORKSPACE_URL}api/v1/remote_write
     sigv4:
       region: $WORKLOAD_REGION
EOF

sudo ./agent-linux-amd64 -config.file ./agent.yml

This Grafana Cloud Agent configuration enables the node_exporter module. If you check the metrics available for AMG now, you’ll find more information available. The Grafana Cloud Agent is sending that information.

Grafana dashboard shows the node_netstat_Tcp_InSegs metric from the AMP workspace.

Figure 4: node_netstat_Tcp_InSegs

Use the OpenTelemetry Collector

AWS Distro for OpenTelemetry Collector is an AWS-supported version of the upstream OpenTelemetry Collector. It’s distributed by Amazon and supports the selected components from the OpenTelemetry community. It is fully compatible with AWS computing platforms, including Amazon EC2, Amazon Elastic Container Service, and Amazon Elastic Kubernetes Service. It enables users to send telemetry data to Amazon CloudWatch metrics, traces, and logs and other supported backends.

In this section, I’ll show you how you can deploy the AWS Distro for OpenTelemetry Collector to collect metrics as an alternative to using the Prometheus server and Grafana Cloud Agent. If the Grafana Cloud Agent is still running, press Control - C to close the session in the console, and then execute the following commands to install the AWS Distro for OpenTelemetry Collector:

mkdir /tmp/otel
cd /tmp/otel
wget https://aws-otel-collector.s3.amazonaws.com/ubuntu/amd64/latest/aws-otel-collector.deb
sudo dpkg -i -E ./aws-otel-collector.deb

cat > config.yaml << EOF
extensions:
  health_check:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:55681
  prometheus:
    config:
      scrape_configs:
        - job_name: "otel-collector"
          scrape_interval: 5s
          static_configs:
            - targets: ["localhost:8888"]
processors:
  batch/traces:
    timeout: 1s
    send_batch_size: 50
  batch/metrics:
    timeout: 60s
    
exporters:
  awsprometheusremotewrite:
    endpoint: ${WORKSPACE_URL}api/v1/remote_write
    aws_auth:
      service: "aps"
      region: $WORKLOAD_REGION
          
service:
  pipelines:
    metrics:
      receivers: [otlp,prometheus]
      processors: [batch/metrics]
      exporters: [awsprometheusremotewrite]
  extensions: [health_check]
EOF

export AOT_RUN_USER=root
sudo --preserve-env=AOT_RUN_USER /opt/aws/aws-otel-collector/bin/aws-otel-collector --config /tmp/otel/config.yaml

I am running the Amazon Distro for Open Telemetry (ADOT) Collector as the root user in order to reuse the shared credential file. The default user for the OTEL Collector (aot) will not have access to the shared credential file in the root user home folder. You can see the OTEL Collector is sending metrics in AMG:

Grafana dashboard shows the otelcol_process_runtime_total_alloc_bytesmetric from the AMP workspace.

Figure 5: otelcol_process_runtime_total_alloc_bytes

Cleanup

To avoid ongoing charges in your AWS account, run the following commands to delete the resources you created. You will also need to clean up or terminate your VM.

rm -f SSMService-Trust.json
aws iam detach-role-policy --role-name SSMServiceRoleRemoteWrite --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
aws iam detach-role-policy --role-name SSMServiceRoleRemoteWrite --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
aws iam delete-role --role-name SSMServiceRoleRemoteWrite
aws amp delete-workspace --workspace-id $WORKSPACE_ID --region $WORKLOAD_REGION

Conclusion

In this blog post, I showed you how you can set up a secure environment to collect Prometheus metrics from an on-premises VM and remote write metrics to AMP. The SSM Agent plays a key role here by providing temporary credentials to the Prometheus server and rotating the authentication keys regularly. For more information, see About SSM Agent.

You can easily collect Prometheus metrics from Amazon EKS, Amazon ECS, and EC2 instances. For more information, see these resources:

About the authors

Imaya Kumar

Imaya Kumar Jagannathan

Imaya is a Senior Solution Architect focused on Amazon CloudWatch and AWS X-Ray. He is passionate about Monitoring and Observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about micro-service architecture design. He loves programming on C#, working with Containers and Serverless technologies.

Rafael Pereyra

Rafael Pereyra

Rafael Pereyra is a Sr. Security Transformation Consultant at AWS Professional Services, where he helps customers securely deploy, monitor and operate solutions in the cloud. Rafael’s interests includes containerized applications, improving observability, monitoring and logging of solutions, IaC and automation in general. In Rafael’s spare time, he enjoys cooking with family and friends