Why can’t I view the Apache Spark history events or logs from the Spark web UI in Amazon EMR?

Last updated: 2022-12-08

I'm not able to view the Apache Spark history events or logs from the Spark web UI in Amazon EMR.

Short description

You can view the Spark History Server on Amazon EMR using the following:

  • Off-cluster access to persistent application user interfaces (starting from EMR 5.25.0).
  • On-cluster application user interfaces by setting up a web proxy through an SSH connection.

Resolution

I can't view my Spark history events using the off-cluster persistent Spark History Server or UI

Persistent Spark History Server events aren't accessible on EMR clusters that have the following:

  • Multiple master nodes.
  • EMR clusters integrated with AWS Lake Formation.
  • The default directory was changed from HDFS to a different file system, such as Amazon Simple Storage Service (Amazon S3).

For more information, see View persistent application user interfaces - Considerations and limitations.

Note: The persistent Spark History Server isn't suitable for load testing or for viewing thousands of Spark applications running in parallel. To load test or to view many applications, use the on-cluster Spark History Server.

I can't view my Spark history events using the on-cluster Spark History Server or UI

On-cluster Spark History Server events aren't accessible if you save the Spark events in an S3 bucket on Amazon EMR releases before 6.3* and 5.30. The Spark History Server in these Amazon EMR versions doesn't have the emrfs-hadoop-assembly JAR file required to access S3 buckets. Without this JAR file, you receive the following error when trying to access Spark History Server events:

INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

To avoid this error, use the following cluster configuration. This configuration adds the required emrfs-hadoop-assembly JAR file to the Spark History Server's classpath.

[
  {
    "classification": "spark-defaults",
    "configurations": [],
    "properties": {
      "spark.eventLog.dir": "s3://<yourbucket>/",
      "spark.history.fs.logDirectory": "s3://<yourbucket>/"
    }
  },
  {
    "classification": "spark-env",
    "configurations": [
      {
        "classification": "export",
        "properties": {
          "SPARK_DAEMON_CLASSPATH": "$(ls /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-*)"
        }
      }
    ],
    "properties": {}
  }
]

In EMR 6.x releases (until 6.3) setting the Spark event log directory to S3 during cluster launch without the emrfs-hadoop-assembly causes cluster termination. Making changes after the cluster is running might cause the configuration changes to revert.

Note: This workaround isn't necessary for Amazon EMR releases after 6.3 and 5.30 because the required JAR file is added by default to the folder /usr/lib/spark/jars/.

Keep in mind that the Spark History Server doesn't show the applications in the incomplete application list when Spark events are written to S3. Also, If the Spark context is improperly closed, the event logs don't properly upload to S3.

I can't view Spark history events for my cluster that's in a private subnet

If you use a private subnet for your cluster, make sure that the Amazon Virtual Private Cloud (Amazon VPC) endpoint policy for the private subnet is correct. The VPC endpoint policy must include arn:aws:s3:::prod.MyRegion.appinfo.src/* in the resource list of the S3 . For more information, see Minimum Amazon S3 policy for private subnet.

I'm getting the error "NET:ERR_CERT_COMMON_NAME_INVALID" after turning on encryption in transit on the Spark UI

This error is caused by a browser certificate validation issue, If you're using Mozilla Firefox, you'll see an option to accept the risk and continue using the certificate. In Google Chrome, enter thisisunsafe on the warning page so that Chrome will skip certificate validation.


Did this article help?


Do you need billing or technical support?