Posted On: Nov 26, 2019
Amazon EMR now supports running multiple EMR steps at the same time, the ability to cancel running steps, and AWS Step Functions. Running steps in parallel allows you to run more advanced workloads, increase cluster resource utilization, and reduce the amount of time taken to complete your workload. The number of steps allowed to run at once is configurable and can be set when a cluster is launched and at any time after the cluster has started. With the ability to cancel running steps, you now have more control over step execution, including the ability to forcefully cancel steps. Running steps in parallel is also supported with AWS Step Functions, allowing you to create and scale clusters, and orchestrate step execution using Step Functions workflows.
Steps allow you to submit workloads to EMR applications such as Apache Spark, Apache Hive, Apache YARN, and Presto without the need to connect directly to an EMR cluster. Steps can be added to a cluster using the EMR console and API. Until now, steps ran sequentially with each step needing to complete before the next step can begin, and steps that were running could not be canceled.
With the ability to run steps in parallel, it’s now possible to create more advanced workflows involving conditional logic, and branching. To simplify the creation and management of these workflows, we’re happy to announce a new integration with AWS Step Functions. Step Functions now supports EMR steps, allowing you to orchestrate cluster creation, programmatically scale clusters resources, and manage step execution, dependencies and exception handling with EMR.
Running steps in parallel, and cancelling running steps is supported with EMR release 5.28.0 and is available in Asia Pacific (Hong Kong, Mumbai, Tokyo), EU (Frankfurt, Ireland, Stockholm), Middle East (Bahrain), South America (São Paulo), US East (N. Virginia), US West (N. California & Oregon), with more regions coming in the following weeks.