When I use custom JAR files in my Apache Spark jobs on Amazon EMR, I get a java.lang.ClassNotFoundException error. 

This error occurs when one of the following are true:

  • The spark-submit job can't find the relevant files in the class path.
  • A bootstrap action or custom configuration is overriding the class paths. When this happens, the class loader only picks up the JAR files that exist in the location that you specified in your configuration.

Check the stack trace to find the name of the missing class. Then, add the path of your custom JAR (containing the missing class) to the Spark class path. You can do this while the cluster is running, when you launch a new cluster, or when you submit a job.

On a running cluster:

In /etc/spark/conf/spark-defaults.conf, append the path of your custom JAR to the class names that are specified in the error stack trace. In the following example, /home/hadoop/extrajars/* is the custom JAR path.

sudo vim /etc/spark/conf/spark-defaults.conf

spark.driver.extraClassPath :/home/hadoop/extrajars/*
spark.executor.extraClassPath :/home/hadoop/extrajars/*

On a new cluster:

Append the custom JAR path to the existing class paths in /etc/spark/conf/spark-defaults.conf by supplying a configuration object when you create a cluster.

Note: To use this option, you must create a cluster using Amazon EMR release version 5.14.0 or later.

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.driver.extraClassPath":"/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*",
      "spark.executor.extraClassPath":"/usr/lib/hadoop-lzo/lib/:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*"
    }
  }
]

For a single job:

Use the --jars option to pass the custom JAR path when you run spark-submit. Example:

spark-submit --deploy-mode client --class org.apache.spark.examples.SparkPi --master yarn spark-examples.jar 100 --jars /home/hadoop/extrajars/*

Note: To prevent class conflicts, do not include standard JARs when using the --jars option. For example, don't include spark-core.jar because it already exists in the cluster.

For more information about configuring classifications, see Configure Spark.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2019-02-26