I am trying to manually scale up or scale down my Amazon EMR cluster, but the resize request is stuck or is timing out.

If a manual resize request takes longer than the value defined in yarn.resourcemanager.decommissioning.timeout (the default is one hour), then manually restart the instance-controller process.

Note: This resolution doesn't work for Amazon EMR release versions 5.14, 5.15, or 5.16.

Important: Back up your data to a persistent storage option, such as Amazon Simple Storage Service (Amazon S3), and then save your Amazon EMR configuration objects before you complete the following steps.

1.    Connect to master node of the Amazon EMR cluster using SSH.

2.    Run the following command to stop and restart the instance-controller process. For more information, see Viewing and Restarting Amazon EMR and Application Processes (Daemons).

sudo service instance-controller stop

3.    Wait a few seconds, and then run the following command to check the status of the instance-controller process.

sudo service instance-controller status

The status should be Running. If the process is not running, execute the following command to manually start it:

sudo service instance-controller start

Note: Be sure that only one instance-controller process is running at a time.

4.    After instance-controller is running, execute the following Yarn command for ResourceManager to reprocess the resize request:

yarn rmadmin -refreshNodes -graceful

Example output:

18/09/02 05:48:34 INFO client.RMProxy: Connecting to ResourceManager at /172.31.xx.xx:8033

The status of the core and task nodes should change to Resizing and the resize request should complete before timing out.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-09-27