Viewing collectd statistics with Amazon Managed Service for Prometheus and Amazon Managed Service for Grafana

Monitoring systems are essential for a resilient solution. A popular tool to monitor Linux-based physical or virtual machines is collectd – a daemon to collect system and application performance metrics periodically. However, collectd doesn’t provide long-term storage for metrics, rich querying, visualization, or an alerting solution. The Amazon Managed Service for Prometheus is a serverless monitoring service for metrics compatible with open-source Prometheus. Amazon Managed Grafana is a fully managed service with rich, interactive data visualizations to help customers analyze, monitor, and alarm on metrics, logs, and traces across multiple data sources.

This post describes how partners and customers can view metrics from a running instance of collectd on a Linux-based Amazon Elastic Compute Cloud (Amazon EC2) instance. The metrics can be stored and queried with Amazon Managed Service for Prometheus. The metrics can be visualized with Amazon Managed Service for Grafana. Further, customers and partners can build alerting mechanisms using either Amazon Managed Service for Prometheus or Amazon Managed Service for Grafana. This can help partners and customers move their existing or create new monitoring workloads in a cloud-native fashion.

Time to read	6 minutes
Time to complete	15 minutes
Cost to complete (estimated)	10 USD (at publication time)
Learning level	Intermediate (200)
Services used	Amazon EC2 Amazon Managed Service for Prometheus Amazon Managed Service for Grafana

Solution overview

The following figure demonstrates how this post utilizes the Prometheus official exporter – collectd_exporter – from Prometheus to collect metrics from the collectd daemon. Then, the Prometheus server scrapes metrics from collectd_exporter and writes to the Amazon Managed Service for Prometheus instance. The Amazon Managed Service for Grafana instance will connect to Amazon Managed Service for Prometheus and host the dashboards for visualization.

Figure 1 Flow for collectd metrics into Amazon Managed Service for Prometheus and Amazon Managed Service for Grafana

Install the collectd daemon, collectd_exporter, and Prometheus server on the same machine for this post. Configure the Prometheus server to write to an Amazon Managed Service for Prometheus workspace remotely. For this to happen, configure the Amazon EC2 instance with an AWS Identity and Access Management (IAM) Role that has the AmazonPrometheusRemoteWriteAccess policy. A YAML file with Prometheus recording rules is uploaded to the Amazon Managed Service for Prometheus workspace to generate the average values of metrics over a five-minute duration.

Configure the Amazon Managed Service for Grafana workspace to use the Amazon Managed Service for Prometheus workspace as the data source. The Amazon Managed Service for Grafana workspace will have dashboards to display the metrics.

Walkthrough

Implement this solution using the following overview. It uses a Ubuntu Linux-based Amazon EC2 instance for testing purposes, though any Linux-based device is acceptable.

Install collectd.
Configure collectd to write metrics via HTTP to collectd_exporter.
Install collectd_exporter.
Launch the Amazon Managed Service for Prometheus instance.
Install Prometheus server.
Configure the Prometheus server to write to the Amazon Managed Service for Prometheus instance.
Add Prometheus recording rules in Amazon Managed Service for Prometheus.
Launch Amazon Managed Service for Grafana instance.
Import JSON to view metrics.

Prerequisites

Complete the following perquisites before starting this walkthrough:

Create an AWS account
Create a Linux-based Amazon EC2 instance using Ubuntu or another on any third-party software or hardware.

Install collectd

The preferred way to install collectd is to use a package manager. Installation through compilation is also an option. However, this is out of the scope of this post.

Connect to your Linux instance using Session Manager.

Run the following commands:

sudo apt-get update
sudo apt-get install -y collectd

Configure collectd

Back-up the configuration file with the following command:
sudo cp /etc/collectd/collectd.conf /etc/collectd/collectd.conf.old
Copy the following configuration into /etc/collectd/collectd.conf

FQDNLookup true
Interval 10
LoadPlugin syslog
<Plugin syslog>
    LogLevel info
</Plugin>
LoadPlugin df
LoadPlugin cpu
LoadPlugin memory
LoadPlugin csv
LoadPlugin write_http
<Plugin "cpu">
    ValuesPercentage true
</Plugin>
<Plugin "df">
    IgnoreSelected false
    MountPoint "/"
</Plugin>
<Plugin "csv">
  DataDir "/var/lib/collectd/csv"
  StoreRates true
</Plugin>
<Plugin write_http>
  <Node "collectd_exporter">
    URL "http://localhost:9103/collectd-post"
    Format "JSON"
    StoreRates false
  </Node>
</Plugin>

The above configuration file does the following:

Loads the input plugins df, cpu, and memory. These plugins do the following:
a. df – Collects file system usage information.
b. cpu – Collects CPU usage.
c. memory – Collects memory usage.
Loads the logging plugin syslog that writes debug and status information to syslog. This plugin helps monitor collectd.
Load the output plugins csv and http_write. The csv plugin is optional, as it helps confirm the metrics gathered on the host system. The http_write plugin will write to the collectd_exporter.
Configure the http_write plugin to write to collectd_exporter at http://localhost:9103/collectd-post

The collectd wiki maintains a list of available collectd plugins.

Restart collectd

Restart collectd with the following command:
sudo /etc/init.d/collectd restart
Verify by looking at the logs at /var/log/syslog. Here is a sample output:

Mar  3 05:31:27 ip-172-xxx-xxx-xxx systemd[1]: Stopped Statistics collection and monitoring daemon.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx systemd[1]: Starting Statistics collection and monitoring daemon...
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: plugin_load: plugin "df" successfully loaded.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: plugin_load: plugin "cpu" successfully loaded.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: plugin_load: plugin "memory" successfully loaded.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: plugin_load: plugin "csv" successfully loaded.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: plugin_load: plugin "http_write" successfully loaded.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: Systemd detected, trying to signal readiness.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx collectd[23257]: Initialization complete, entering read-loop.
Mar  3 05:31:27 ip-172-xxx-xxx-xxx systemd[1]: Started Statistics collection and monitoring daemon.

Collected metrics can be viewed in the /var/lib/collectd/csv folder.

Install collectd_exporter

On the GitHub page for collectd_exporter, the installation instruction refers to running a Docker image. However, this isn’t mandatory. This section describes building and running the collectd_exporter binary for the Linux instance.

Open a new terminal window.
Connect to your Linux instance using Session Manager.
Install Go with the following commands:

export GO_VERSION=17.7
curl -OL https://go.dev/dl/go1.${GO_VERSION}.linux-amd64.tar.gz
sudo tar -C /usr/local -xvf go1.${GO_VERSION}.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

Build collectd_exporter with the following commands:

git clone https://github.com/prometheus/collectd_exporter
cd collectd_exporter
export DOCKER_ARCHS=amd64
make
export PATH=$PATH:$HOME/collectd_exporter

Launch collectd_exporter

Launch collectd_exporter with the following command:
collectd_exporter
These are the logs:

level=info ts=2022-03-04T01:34:03.304Z caller=main.go:323 msg="Starting collectd_exporter" version="(version=0.5.0, branch=master, revision=f81b12fc801d98bb3dcfa11aec4e03f9bbb31e29)"
level=info ts=2022-03-04T01:34:03.304Z caller=main.go:324 msg="Build context" context="(go=go1.17.7, user=ubuntu@ip-172-31-22-27, date=20220303-05:21:09)"
level=info ts=2022-03-04T01:34:03.304Z caller=main.go:346 msg="Listening on address" address=:9103
level=info ts=2022-03-04T01:34:03.304Z caller=tls_config.go:191 msg="TLS is disabled." http2=false

Note the port number that the exporter is listening on – it is 9103. This is configured when launching collectd earlier.

Launch Amazon Managed Service for Prometheus workspace

Create the Amazon Managed Service for Prometheus workspace.
Copy the workspace ID from Step 1. The section for configuring the Prometheus server will use the workspace ID.
Create an IAM role with the AmazonPrometheusRemoteWriteAccess policy and attach the role to the Amazon EC2 instance.
Add the following into YAML file named collectd-prometheus.yaml, and upload it to the workspace created.

groups:
  - name: collectd
    rules:
    - record: metric:collectd_df_GB
      expr: collectd_df_df_complex/1000000000
    - record: metric:collectd_avg_df_GB
      expr: sum(rate(metric:collect_df_GB[5m])) by (type)
    - record: metric:collectd_memory_GB
      expr: collectd_memory/1000000000
    - record: metric:collectd_avg_memory_GB
      expr: sum(rate(collectd_memory_GB[5m])) by (memory)
    - record: metric:collectd_cpu_percent
      expr: sum(rate(collectd_cpu_percent[5m])) by (cpu, type)

Install Prometheus server

Open a new terminal window.
Connect to your Linux instance using Session Manager.
Run the following commands to install the Prometheus server:

export PROMETHEUS_VERSION=2.34.0
curl -OL https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}-rc.0/prometheus-${PROMETHEUS_VERSION}-rc.0.linux-amd64.tar.gz
tar xvzf prometheus-${PROMETHEUS_VERSION}-rc.0.linux-amd64.tar.gz
export PATH=$PATH:$HOME/prometheus-${PROMETHEUS_VERSION}-rc.0.linux-amd64

Configure Prometheus server

On the Amazon EC2 instance, create a configuration file named prometheus.yaml using the following content. Change regionId as applicable. Change workspaceId with the workspace ID from the section on launching the Amazon Managed Service for Prometheus workspace.

global:
  scrape_interval: 15s
  external_labels:
    monitor: 'collectd_exporter'

scrape_configs:
  - job_name: 'collectd_exporter'
    static_configs:
      - targets: ['localhost:9103']

remote_write:
  -
    url: https://aps-workspaces.<regionId>.amazonaws.com/workspaces/<workspace Id>/api/v1/remote_write
    queue_config:
        max_samples_per_send: 1000
        max_shards: 200
        capacity: 2500
    sigv4:
        region: <regionId>

This configuration file has a scraping target set as localhost:9103 because the collectd_exporter exposes a metrics end-point here.

Launch Prometheus server

Return to the terminal for launching where Prometheus.
Launch the Prometheus server with the prometheus.yaml configuration file created previously using the following commands:

export PROMETHEUS_VERSION=2.34.0
export PATH=$PATH:$HOME/prometheus-${PROMETHEUS_VERSION}-rc.0.linux-amd64
prometheus --config.file=prometheus.yaml

The console should show output similar to this to indicate that the Prometheus server successfully connects to Amazon Managed Service for Prometheus

ts=2022-03-04T01:12:25.394Z caller=dedupe.go:112 component=remote level=info remote_name=4b1ae2 url=https://aps-workspaces.regionId.amazonaws.com/workspaces/workspaceId/api/v1/remote_write msg="Starting WAL watcher" queue=4b1ae2
ts=2022-03-04T01:12:25.394Z caller=dedupe.go:112 component=remote level=info remote_name=4b1ae2 url=https://aps-workspaces.regionId.amazonaws.com/workspaces/workspaceId/api/v1/remote_write msg="Starting scraped metadata watcher"
ts=2022-03-04T01:12:25.394Z caller=dedupe.go:112 component=remote level=info remote_name=4b1ae2 url=https://aps-workspaces.regionId.amazonaws.com/workspaces/workspaceId/api/v1/remote_write msg="Replaying WAL" queue=4b1ae2

Launch Grafana workspace

Create a workspace.
Define user access.

Add Prometheus data source

Add the Amazon Managed Service for Prometheus data source.

Create dashboard

Create a new dashboard with the Import option.
Upload the following JSON document:

{"__inputs":[],"__requires":[{"type":"grafana","id":"grafana","name":"Grafana","version":"8.2.5"},{"type":"panel","id":"timeseries","name":"Time series","version":""}],"annotations":{"list":[{"builtIn":1,"datasource":"-- Grafana --","enable":true,"hide":true,"iconColor":"rgba(0, 211, 255, 1)","name":"Annotations & Alerts","target":{"limit":100,"matchAny":false,"tags":[],"type":"dashboard"},"type":"dashboard"}]},"editable":true,"fiscalYearStartMonth":0,"gnetId":null,"graphTooltip":0,"id":null,"links":[],"liveNow":true,"panels":[{"datasource":null,"fieldConfig":{"defaults":{"color":{"mode":"palette-classic"},"custom":{"axisLabel":"","axisPlacement":"auto","barAlignment":0,"drawStyle":"line","fillOpacity":0,"gradientMode":"none","hideFrom":{"legend":false,"tooltip":false,"viz":false},"lineInterpolation":"linear","lineWidth":1,"pointSize":5,"scaleDistribution":{"type":"linear"},"showPoints":"auto","spanNulls":false,"stacking":{"group":"A","mode":"none"},"thresholdsStyle":{"mode":"off"}},"mappings":[],"thresholds":{"mode":"absolute","steps":[{"color":"green","value":null},{"color":"red","value":80}]}},"overrides":[]},"gridPos":{"h":7,"w":24,"x":0,"y":0},"id":6,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom"},"tooltip":{"mode":"single"}},"targets":[{"exemplar":true,"expr":"metric:collectd_cpu_percent","interval":"","legendFormat":"CPU-{{cpu}} {{type}}","refId":"A"}],"title":"CPU (%)","type":"timeseries"},{"datasource":null,"fieldConfig":{"defaults":{"color":{"mode":"palette-classic"},"custom":{"axisLabel":"","axisPlacement":"auto","barAlignment":0,"drawStyle":"line","fillOpacity":0,"gradientMode":"none","hideFrom":{"legend":false,"tooltip":false,"viz":false},"lineInterpolation":"linear","lineWidth":1,"pointSize":5,"scaleDistribution":{"type":"linear"},"showPoints":"auto","spanNulls":false,"stacking":{"group":"A","mode":"none"},"thresholdsStyle":{"mode":"off"}},"mappings":[],"thresholds":{"mode":"absolute","steps":[{"color":"green","value":null},{"color":"red","value":80}]}},"overrides":[]},"gridPos":{"h":7,"w":24,"x":0,"y":7},"id":4,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom"},"tooltip":{"mode":"single"}},"targets":[{"exemplar":true,"expr":"metric:collectd_memory_GB","interval":"","legendFormat":"{{memory}}","refId":"A"}],"title":"RAM (GB)","type":"timeseries"},{"datasource":null,"fieldConfig":{"defaults":{"color":{"mode":"palette-

Launch the following dashboard to visualize:

CPU utilization is based on CPU number and utilization type.
RAM utilization is based on utilization type.
Disk space utilization is based on utilization type.

Figure shows Dashboard visualization

Cleaning up

To avoid incurring future charges, delete the following resources:

Conclusion

This post describes a means to launch collectd daemon on an Amazon EC2 Linux instance to gather system metrics. These metrics are ingested into Amazon Managed Service for Prometheus for long-term storage and querying. Finally, the ingested metrics are used in Amazon Managed Service for Grafana to visualize metrics. Depending on the requirements, different plugins of collectd may be installed and configured. Furthermore, configure Amazon Managed Service for Prometheus and Amazon Managed Service for Grafana to raise required alerts.

Author:

AWS Cloud Operations & Migrations Blog