AWS Official Blog

Tag Your Elastic MapReduce Clusters

by Jeff Barr | on | in Amazon Elastic MapReduce |

Amazon Elastic MapReduce gives you the power to process vast amounts of data using Hadoop, an open source parallel processing framework. Behind the scenes, each Elastic MapReduce cluster runs on an array of Amazon EC2 instances.

These clusters can grow to hundreds or even thousands of instances, and you can even run several clusters at the same time.  At this scale, tracking the costs and the “moving parts” (clusters and instances) manually can become difficult. Perhaps your enterprise runs one cluster for processing log data, and a couple of others for research. Or, you might have a large production cluster with several smaller siblings for development and testing.

Tag Those Clusters
Today we are introducing a new tagging feature for Elastic MapReduce. You can now add up to 10 tags per cluster at launch time. You can also add, remove, and edit tags while the cluster is running. Any changes that you make to the tags on the cluster will be mirrored to the EC2 instances in the cluster.

For example, if you launch a cluster and specify that the department tag should be set to the value “development,” the cluster and each of its instances will have the tag. If you decide to change the department to “production,” the change will appear on the cluster and on all of the associated instances.

You can set and edit the tags from the Elastic MapReduce Console, the Elastic MapReduce Command Line, or from the Elastic MapReduce APIs.

Here is how you set the tags from the console:

And here’s how you edit them:

System Tags
Each of the EC2 instances in an Elastic MapReduce cluster also gets a pair of system tags:

The aws:elasticmapreduce:instance-group-role tag will be set to “CORE” or “TASK” to indicate the role that the instance plays in the cluster.

The aws:elasticmapreduce:job-flow-id tag will be set to the Job Flow ID of the cluster.

These system tags make it easy to identify the EC2 instances associated with Elastic MapReduce clusters when you are looking at an EC2 billing report.

This feature is available now and you can start using it today.

– Jeff;