Why is my EMR cluster not terminating or terminating earlier than expected when I'm using an auto-termination policy?

2 minute read
0

I have an auto-termination policy configured for my Amazon EMR cluster. The cluster either keeps running as active or terminates earlier than the idle timeout configured in the auto-termination policy.

Short description

When you create an EMR cluster, you can turn on the auto-termination policy. The auto-termination policy terminates the cluster after a specific amount of idle time.

Resolution

1.    Make sure that the Amazon Elastic Compute Cloud (Amazon EC2) instance profile role, EMR_EC2_DefaultRole, has the following permissions. If the EMR EC2 instance profile role doesn't have these permissions, then the cluster stays active even if it meets the idle timeout requirement.

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "AllowAutoTerminationPolicyActions",
        "Effect": "Allow",
        "Action": [
            "elasticmapreduce:PutAutoTerminationPolicy",
            "elasticmapreduce:GetAutoTerminationPolicy",
            "elasticmapreduce:RemoveAutoTerminationPolicy"
        ],
        "Resource": "your-resources"
    }

In Amazon EMR versions 5.34 to 5.36 and 6.4.0 or later, a cluster is idle when the following are true:

  • There are no active YARN applications.
  • HDFS utilization is below 10%.
  • There are no active EMR notebook or EMR Studio connections.
  • There are no on-cluster application user interfaces in use.

In Amazon EMR versions 5.30.0 to 5.33.1 and 6.1.0 to 6.3.0, a cluster is idle when the following are true:

  • There are no active YARN applications.
  • HDFS utilization is below 10%.
  • The cluster has no active Spark jobs.

2.    Make sure that the metrics-collector process is running. The metrics-collector process collects the metrics to determine auto termination. Run the following commands to check the metrics-collector process:

ps -ef|grep metrics-collector

-or-

systemctl status metricscollector.service

For more information, see How do I restart a service in Amazon EMR?

3.    When you turn on auto-termination using an auto-termination policy, Amazon EMR emits the AutoTerminationClusterIdle Amazon CloudWatch metric at a one-minute granularity. This metric evaluates if the cluster meets the idle state requirement. If this metric shows "1", then the cluster is idle. If it shows "0", then the cluster is still active.

View the EMR cluster's CloudWatch metrics and verify that the AutoTerminationisCluseterIdle CloudWatch metric is continuously "1" in the cluster. If it's continuously "1", then the cluster qualifies for auto-termination.


Related information

Using an auto-termination policy

Monitor metrics with CloudWatch

AWS OFFICIAL
AWS OFFICIALUpdated a year ago
No comments

Relevant content