YARN is still using resources even though the Spark job that I executed on Amazon EMR from Juypter or Zeppelin has finished executing

Last updated: 2019-08-23

I'm running a Jupyter or Zeppelin notebook on my Amazon EMR cluster. Why does a YARN application keep running even after the Apache Spark job that I submitted from Jupyter or Zeppelin has finished executing?

Short Description

When you run a Spark notebook in Zeppelin or Jupyter, Spark starts an interpreter. The interpreter creates a YARN application, which is the Spark driver that shows up when you list applications. The driver doesn't terminate when you finish executing a job from the notebook. This is by design—the Spark driver stays active so that it can request application containers for on-the-fly code execution. The downside is that the YARN application might be using resources that other jobs need. To resolve this issue, you can manually kill the YARN application. Alternatively, you can set a timeout value that automatically kills the application.

Resolution

In Zeppelin

Option 1: restart the Spark interpreter

Before you begin, be sure that you have permissions to restart the interpreter in Zeppelin.

1.    Open Zeppelin.

2.    From the drop-down menu next to the user name in the top-left corner, choose Interpreter.

3.    Find the Spark interpreter, and then choose restart. Zeppelin terminates the YARN job when the interpreter restarts.

Option 2: manually kill the YARN job

Before you begin, be sure that you have SSH access to the Amazon EMR cluster and that you have permission to run YARN commands.

Use the -kill command to terminate the application. In the following example, replace application_id with your application ID.

yarn application -kill application_id

Option 3: set an interpreter timeout value

Zeppelin versions 0.8.0 and later (available in Amazon EMR versions 5.18.0 and later) include a lifecycle manager for interpreters. Use the TimeoutLifecycleManager setting to terminate interpreters after a specified idle timeout period:

1.    Create a etc/zeppelin/conf/zeppelin-site.xml file with the following content. In this example, the time out is set to 120,000 milliseconds (2 minutes). Choose a timeout value that's appropriate for your environment.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.class</name>
  <value>org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager</value>
  <description>This is the LifecycleManager class for managing the lifecycle of interpreters. The interpreter terminates after the idle timeout period.</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.checkinterval</name>
  <value>60000</value>
  <description>The interval for checking whether the interpreter has timed out, in milliseconds.</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.threshold</name>
  <value>120000</value>
  <description>The idle timeout limit, in milliseconds.</description>
</property>
</configuration>

2.    Run the following commands to restart Zeppelin:

$ sudo stop zeppelin
$  sudo start zeppelin

In Jupyter

Option 1: manually shut down the notebook

When the code executions are complete, use one of the following methods to kill the kernel in the Jupyter user interface:

  • In the Jupyter notebook interface, open the File menu, and then choose Close and Halt.
  • On the Jupyter dashboard, open the Running tab. Choose Shutdown for the notebook that you want to stop.

Option 2: manually shut down the kernel

From the Jupyter notebook interface, open the Kernel menu, and then choose Shutdown.

Option 3: configure the timeout attribute

If you close the notebook tab or browser window before shutting down the kernel, the YARN job keeps running. To prevent this from happening, configure the NotebookApp.shutdown_no_activity_timeout attribute. This attribute terminates the YARN job after a specified idle timeout period, even if you close the tab or browser window.

To configure the NotebookApp.shutdown_no_activity_timeout attribute:

1.    Open the /etc/jupyter/jupyter_notebook_config.py file on the master node and then add an entry similar to the following. In this example, the timeout attribute is set to 120 seconds. Choose a timeout value that's appropriate for your environment.

c.NotebookApp.shutdown_no_activity_timeout = 120

2.    Run the following commands to restart jupyterhub:

sudo docker stop jupyterhub
sudo docker start jupyterhub