How do I cancel an Amazon EMR step?

Last updated: 2020-10-12

I'm trying to cancel an Amazon EMR step. When I run the cancel-steps command, I get the following error: "Cannot cancel the step. It is already RUNNING."

Short description

This error affects Amazon EMR versions 5.27.x and earlier. In these release versions, the cancel-steps command cancels pending steps only. To cancel a running step, kill either the application ID (for YARN steps) or the process ID (for non-YARN steps).

In Amazon EMR versions 5.28.0 and later, you can use cancel-steps to cancel both pending and running steps. For more information, see Work with steps using the AWS CLI and console.

Resolution

Use one of the following methods to cancel running steps in Amazon EMR versions 5.27.x and earlier.

Cancel YARN applications

1.    Connect to the master node using SSH.

2.    To find the step's application ID, run the following command to list all running applications.

yarn application -list

3.    Run the following command to kill the application. Replace application_id with your application ID, such as "application_1505786029486_002."

Note: This command kills all pending steps in the queue.

yarn application -kill application_id

Cancel non-YARN applications

1.    Connect to the master node using SSH.

2.    Run the following command to get the process ID (pid). In the following example, replace step_id with your step identifier, such as s-Y9XXXXXXAPMD.

ps -ef |grep -i step_id

In the following example output, the process ID is 2366:

hadoop    2366  4664  0 16:20 ?        00:00:01 /etc/alternatives/jre/bin/java -Xmx1000m -server -XX:OnOutOfMemoryError=kill -9 %p -Dhadoop.log.dir=/mnt/var/log/hado
op/steps/s-2RNURIK9Z2JUH -Dhadoop.log.file=syslog -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.pat
h=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.io.tmpdir=/mnt/var/lib/hadoop/st
eps/s-2RNURIK9Z2JUH/tmp -Dhadoop.security.logger=INFO,NullAppender -Dsun.net.inetaddr.ttl=30 org.apache.hadoop.util.RunJar /var/lib/aws/emr/step-runner/hadoop-
jars/command-runner.jar bash -c envsubst < /home/hadoop/truffle_suffle.json.template

3.    Run the following command to kill the process. Replace 2366 with the process identifier for your step.

Note: This command kills all pending steps in the queue.

kill -9 2366

The status of the step changes from Running to Failed.


Did this article help?


Do you need billing or technical support?