I am running Spark jobs on my Amazon EMR cluster, and a core node is almost out of disk space. Why is that happening?

Check for these common causes of disk space use on the core node:

Local and temp files from the Spark application

When you run Spark jobs, Spark applications create local files that can consume the rest of the disk space on the core node. Check the size of the following directories on the core node:

  • <local-dir>/filecache
  • <local-dir>/usercache//filecache
  • <local-dir>/usercache//appcache/<app-id>/

Note: <local-dir> is specified by the yarn.nodemanager.local-dirs property in the /etc/hadoop/conf/yarn-site.xml file.

If local files are consuming the rest of the disk space, scale your cluster. For more information, see Scaling Cluster Resources.

Note: If the number of Spark executors does not scale up as expected, increase the storage capacity of the Amazon Elastic Block Store (Amazon EBS) volumes that are attached to the core node. Optionally, add more Amazon EBS volumes to the core node.

Spark application logs and job history files

When you run Spark jobs, Spark creates application logs and job history files on the HDFS. These logs can consume the rest of the disk space on the core node. To resolve this problem, check the directories where the logs are stored and change the retention parameters, if necessary.

Spark application logs, which are the YARN container logs for your Spark jobs, are located in /var/log/hadoop-yarn/apps on the core node. These logs are moved to HDFS when the application is finished running. By default, YARN keeps application logs on HDFS for 48 hours. Perform the following steps to reduce the retention period.

  1. Connect to the master node using SSH.
  2. Open the /etc/hadoop/conf/yarn-site.xml file on each node in your Amazon EMR cluster (master, core, and task nodes).
  3. Reduce the value of the yarn.log-aggregation.retain-seconds property on all nodes.
  4. Restart the ResourceManager daemon. For more information, see Viewing and Restarting Amazon EMR and Application Processes (Daemons).

Note: After the application logs are copied to HDFS, they remain on the local disk so that Log Pusher can push the logs to Amazon Simple Storage Service (Amazon S3). The default retention period is four hours. To reduce the retention period, modify the /etc/logpusher/hadoop.config file.

Spark job history files are located in /var/log/spark/apps on the core node. When the filesystem cleaner runs, Spark deletes job history files that are older than seven days.

To reduce the default retention period, perform the following steps:

  1. Connect to the master node using SSH.
  2. Open the /etc/spark/conf/spark-defaults.conf file on the master node.
  3. Reduce the value of the spark.history.fs.cleaner.maxAge property.

By default, the filesystem history cleaner runs once a day. The frequency is specified in the spark.history.fs.cleaner.interval property. For more information, see Monitoring and Instrumentation.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-08-08