Posted On: Mar 30, 2022

Amazon EMR Managed Scaling automatically resizes EMR clusters for best performance and resource utilization. Today, we are excited to announce a new capability in Managed Scaling that prevents it from scaling down instances that store intermediate shuffle data for Apache Spark. Intelligently scaling down clusters without removing the instances that store intermediate shuffle data prevents job re-attempts and re-computations, which leads to better performance, and lower cost.

With EMR Managed Scaling you specify the minimum and maximum compute limits for your clusters. EMR Managed Scaling can be used with Amazon EC2 Spot Instances, that let you take advantage of unused EC2 capacity for up to 90% discount from on-demand prices. EMR Managed Scaling continuously samples key metrics associated with the workloads running on clusters and resizes clusters based on workload and utilization. These metrics now include monitoring instances that have intermediate shuffle data for Apache Spark.

This capability is supported on Amazon EMR release version 5.34 and 6.4.0 and later. No further action is needed from your end. This feature is available across 20 AWS regions globally: US East (N. Virginia and Ohio), US West (Oregon and N. California), South America (São Paulo), Europe (Frankfurt, Ireland, London, Milan, Paris, and Stockholm), Canada (Central), Asia Pacific (Hong Kong, Mumbai, Seoul, Singapore, Sydney, and Tokyo), Middle East (Bahrain), and Africa (Cape Town).

To learn more, visit the Managed Scaling documentation for additional details.