Amazon EMR uses real-time capacity insights to provision spot instances to lower cost and interruption

Posted on: Jun 25, 2020

Amazon EMR now offers a “Capacity Optimized” allocation strategy for provisioning Spot Instances in an Amazon EMR cluster. The “Capacity Optimized” allocation strategy automatically makes the most efficient use of available spare capacity while still taking advantage of the steep discounts offered by Spot Instances. By offering the possibility of fewer interruptions, the capacity-optimized strategy can lower the overall cost of your workload. 

Capacity Optimized allocation strategy uses real-time capacity data to allocate instances from the Spot Instance pools with the optimal capacity for the number of instances that are launching. This allocation strategy is appropriate for workloads that have a higher cost of interruption. Examples include long running jobs and multi-tenant persistent clusters running Apache Spark, Apache Hive and Presto.  

The allocation strategy option also lets you specify up to five EC2 instance types per task node when creating your cluster with instance fleet configuration. This allows you to diversify your spot requests and hence get steep discounts.  

Amazon EMR has several enhancements to improve elasticity and resiliency for customers, including graceful decommissioning Amazon EC2 Spot instances running Apache Spark and Apache Hadoop applications on Amazon EMR cluster. To prevent data loss, Amazon EMR scaling ensures that your node has no running Apache Hadoop tasks or unique data that could be lost before removing your node. Amazon EMR has customizations to open-source Spark that make it more resilient to node loss – integrating with YARN’s decommissioning mechanism, extending Spark’s decommissioning mechanism and actions on decommissioned nodes.  

Please see documentation to learn how to configure instance fleets, how to create a service role for Amazon EMR (EMR Role) and the API specifications

Amazon EMR support for allocation strategy is now Generally Available on EMR Release Versions 5.12.1 and above, in all commercial AWS regions where Amazon EMR is available. Here is a link to the overall Regional Availability of Amazon EMR