Why is my Amazon EMR cluster unreachable?

3 minute read
0

I can't connect to my Amazon EMR cluster.

Short description

The following are common reasons that your EMR cluster might be unreachable:

  • There's a permissions issue in the security group rules.
  • The network setup is incorrect for clusters that are provisioned in a private subnet.
  • There's an issue with cluster authentication setup.
  • There are resource constraints in the cluster nodes.
  • The Amazon EMR service daemon is stopped.

Resolution

Amazon EMR security group rules

1.    Verify that the security group rules are correct. For more information, see Working with Amazon EMR-managed security groups.

2.    Verify that TCP on port 8443 is allowed. Port 8443 allows the cluster manager to talk to the cluster master node.

3.    Verify that SSH on port 22 is allowed, if you're trying to connect to the cluster through SSH.

  1.     If outside users or applications are not able to reach the EMR cluster, then validate the related rules that are set in managed-security groups. Also validate the rules in additional security groups.

EMR clusters in a private subnet

In addition to the items mentioned in the previous section, verify the following for EMR clusters that are in a private subnet:

1.    Verify that the additional managed security group for service access is added. Verify that the rules allow the cluster manager to communicate with the cluster nodes. For more information, see Amazon EMR-managed security group for service access (private subnets).

2.    If you're using a bastion host and you can't reach Amazon EMR through the bastion host, then do the following:

  • Verify that the bastion host security group allows inbound traffic from the client system.
  • Verify that the EMR cluster security groups allow inbound traffic from the bastion host.

As network configuration setups vary, make sure that the end-to-end connection is properly set without any black holes.

Authentication methods

To make sure that authentication is set up correctly, do the following:

1.    If authentication uses an Amazon Elastic Compute Cloud (Amazon EC2) keypair, then verify that it's created and configured correctly. For more information, see Use an Amazon EC2 key pair for SSH credentials.

2.    If authentication uses Kerberos, verify that it is configured correctly. For more information, see Use Kerberos authentication.

Resource constraints in the cluster nodes

1.    Verify that the underlying master node is in running state and isn't terminated.

2.    Check the Instance-state log of the master node to determine how resources are being used.

Run the following command to check for the top CPU user:

ps auxwww --sort -%cpu | head -10

Run the following command to check the kernel's performance:

dmesg | tail -n 25

Run the following command to check memory usage:

free -m

Run the following command to check disk usage:

df -h

EMR cluster daemons

The master node's instance controller (I/C) is the daemon that runs on the cluster nodes. The instance controller communicates with the Amazon EMR control plane and the rest of the cluster. Run the following commands to make sure that it's in running state:

Run the following command to check the status of the instance controller:

sudo systemctl status instance-controller

Run the following command to start the instance controller:

sudo systemctl start instance-controller

AWS OFFICIAL
AWS OFFICIALUpdated a year ago
1 Comment

Command to restart IC on EMR older releases (Verified on EMR 5.20.0): /etc/rc.d/init.d/instance-controller start

replied 4 months ago