Posted On: Nov 18, 2016

You can now configure policies to automatically add (scale out) and terminate (scale in) nodes in your Amazon EMR cluster. Amazon EMR can programmatically scale out applications like Apache Spark and Apache Hive to utilize additional nodes for increased performance and scale in the number of nodes in your cluster to save costs when utilization is low. Your cluster can scale based on Amazon CloudWatch metrics provided by Amazon EMR, including YARN utilization metrics.

Amazon EMR’s scale down behavior is now configurable. Starting with release 5.1.0, Amazon EMR will now terminate nodes when scaling in your cluster as they approach the instance hour for Amazon EC2 billing, regardless of task completion. If you would like to use the previous default behavior, you can also configure your cluster to wait for all running tasks on a node to complete before termination, regardless of proximity to the instance hour boundary.

You can create or modify auto scaling policies from the Amazon EMR console, AWS Command Line Interface (CLI), or the AWS SDK with the Amazon EMR API. Auto Scaling can be enabled with Amazon EMR releases 4.x and 5.x, and scaling down at an hourly boundary is supported on release 5.1.0 and later. Please visit the Amazon EMR documentation for more information about Auto Scaling and scale down behavior.