What are the use cases for running a bootstrap action or running a step on my Amazon EMR cluster?

Both bootstrap actions and Amazon EMR steps are used to complete work on Amazon EMR clusters. The distinctions between them are determined by when and where they run during the lifecycle of a cluster and the type of work that they do.

Bootstrap actions

As described in Understanding the Cluster Lifecycle, bootstrap actions are the first thing to run after an Amazon EMR cluster transitions from the STARTING state to the BOOTSTRAPPING state. Bootstrap actions, which run on all cluster nodes, are scripts that run as the Hadoop user by default, but they can also run as the root user with the sudo command. You can specify up to 16 bootstrap actions per cluster by providing multiple bootstrap-action parameters from the console, AWS Command Line Interface (AWS CLI), or API.

Bootstrap actions can be used to install additional software on your cluster and can be configured to run commands conditionally based on instance-specific values in the instance.json or job-flow.json file. Because bootstrap actions execute before core services such as Hadoop or Spark are installed, the cluster won't start if a bootstrap action fails.

Note: On AMI versions 2.x and 3.x of Amazon EMR, bootstrap actions execute after core services such as Hadoop or Spark are installed. Most predefined bootstrap actions for Amazon EMR AMI versions 2.x and 3.x aren't supported in Amazon EMR releases 4.x. For more information, see Create Bootstrap Actions to Install Additional Software.

Steps

A step is a distinct unit of work, comprising one or more Hadoop jobs that run only on the master node of an Amazon EMR cluster. Because a cluster does not start if a bootstrap action fails, steps must always start after bootstrap actions. Steps are usually used to transfer or process data. One step might submit work to a cluster, and others might process the submitted data and then send the processed data to a particular location. Steps complete their work sequentially, as depicted in the diagram at Running Steps to Process Data. When configuring a step, you can choose what happens after a step fails, which provides a measure of fault tolerance. For more information about creating steps, see Work with Steps Using the AWS CLI and Console.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-10-28

Updated: 2018-09-07