How can I collect custom metrics from Amazon EMR cluster instances and then publish them in CloudWatch?
Last updated: 2020-04-13
I want to configure custom metrics for Amazon EMR cluster instances, such as memory, CPU, and disk space usage. Then, I want to publish the metrics to Amazon CloudWatch. How can I do that?
Note: Custom CloudWatch metrics aren't free. For more information, see Amazon CloudWatch pricing and review the Metrics tab under the Paid tier section.
The following example script uses a cron job to run the monitoring scripts every 5 minutes. Customize this script to fit your use case. For example, in the last line of the script, specify the custom metrics that you want to collect and publish to CloudWatch.
#!/bin/bash echo "install additional Perl modules" sudo yum install -y perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https perl-Digest-SHA.x86_64 echo "download, install, and configure the monitoring scripts" cd /home/hadoop curl https://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.2.zip -O unzip CloudWatchMonitoringScripts-1.2.2.zip && \ rm CloudWatchMonitoringScripts-1.2.2.zip && \ cd aws-scripts-mon echo "setting cron" echo "*/5 * * * * /home/hadoop/aws-scripts-mon/mon-put-instance-data.pl --mem-used-incl-cache-buff --mem-util --disk-space-util --disk-path=/mnt --from-cron" | crontab
2. After the EMR cluster is launched, open the CloudWatch console.
3. On the navigation pane, choose Metrics. Your custom metrics are displayed with the prefix System/Linux.
You can also use Ganglia to monitor metrics, such as CPU and disk space usage, for the EMR cluster or for individual cluster instances.