How can I collect custom metrics from Amazon EMR cluster instances and then publish them in CloudWatch?

Last updated: 2020-04-13

I want to configure custom metrics for Amazon EMR cluster instances, such as memory, CPU, and disk space usage. Then, I want to publish the metrics to Amazon CloudWatch. How can I do that?

Resolution

Note: Custom CloudWatch metrics aren't free. For more information, see Amazon CloudWatch pricing and review the Metrics tab under the Paid tier section.

1.    When you launch an EMR cluster, supply a bootstrap action that downloads monitoring scripts to the Amazon Elastic Compute Cloud (Amazon EC2) instances.

The following example script uses a cron job to run the monitoring scripts every 5 minutes. Customize this script to fit your use case. For example, in the last line of the script, specify the custom metrics that you want to collect and publish to CloudWatch.

#!/bin/bash
echo "install additional Perl modules"
sudo yum install -y perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https perl-Digest-SHA.x86_64
echo "download, install, and configure the monitoring scripts"
cd /home/hadoop
curl https://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.2.zip -O
unzip CloudWatchMonitoringScripts-1.2.2.zip && \
rm CloudWatchMonitoringScripts-1.2.2.zip && \
cd aws-scripts-mon
echo "setting cron"
echo "*/5 * * * * /home/hadoop/aws-scripts-mon/mon-put-instance-data.pl --mem-used-incl-cache-buff --mem-util --disk-space-util --disk-path=/mnt --from-cron" | crontab

2.    After the EMR cluster is launched, open the CloudWatch console.

3.    On the navigation pane, choose Metrics. Your custom metrics are displayed with the prefix System/Linux.

You can also use Ganglia to monitor metrics, such as CPU and disk space usage, for the EMR cluster or for individual cluster instances.


Did this article help you?

Anything we could improve?


Need more help?