How can I modify the Spark configuration in an Amazon EMR notebook?

Last updated: 2020-06-24

How can I customize the configuration for an Apache Spark job in an Amazon EMR notebook?

Short description

An Amazon EMR notebook is a serverless Jupyter notebook. A Jupyter notebook uses the Sparkmagic kernel as a client for interactively working with Spark in a remote EMR cluster through an Apache Livy server. You can use Sparkmagic commands to customize the Spark configuration. A custom configuration is useful when you want to:

  • Change executor memory and executor cores for a Spark Job
  • Change resource allocation for Spark

Resolution

Modify the current session

1.    In a Jupyter notebook cell, run the %%configure command to modify the job configuration. In the following example, the command changes the executor memory for the Spark job.

%%configure -f
{"executorMemory":"4G"}

2.    For additional configurations that you usually pass with the --conf option, use a nested JSON object, as shown in the following example. Use this method instead of explicitly passing a conf object to a SparkContext or SparkSession.

%%configure -f
{"conf":{"spark.dynamicAllocation.enabled":"false"}}

Confirm that the configuration change was successful

1.    On the client side, run the %%info command on Jupyter to see the current session configuration. Example output:

Current session configs: {'executorMemory': '4G', 'conf': {'spark.dynamicAllocation.enabled': 'false'}, 'kind': 'pyspark'}

2.     On the server side, check the /var/log/livy/livy-livy-server.out log on the EMR cluster. If a SparkSession started, you should see a log entry like this:

20/06/24 10:11:22 INFO InteractiveSession$: Creating Interactive session 2: [owner: null, request: [kind: pyspark, proxyUser: None, executorMemory: 4G, conf: spark.dynamicAllocation.enabled -> false, heartbeatTimeoutInSecond: 0]]

Did this article help you?

Anything we could improve?


Need more help?