How can I troubleshoot problems with viewing the Spark UI for AWS Glue ETL jobs?

Last updated: 2020-05-21

Why can't I see the Apache Spark UI for AWS Glue ETL jobs?

Resolution

Choose one of the following solutions, depending on how you're accessing the Spark UI with an AWS CloudFormation stack or with Docker.

AWS CloudFormation stack

When you use an AWS CloudFormation stack to view the Spark UI, an Amazon Elastic Compute Cloud (Amazon EC2) instance makes an HTTPS request to confirm that the Spark UI is working. If that request fails, you get the error "WaitCondition timed out. Received 0 conditions when expecting 1," and the AWS CloudFormation stack is rolled back.

Check the following to resolve this issue:

  • Subnet: Confirm that the subnet can reach the Amazon Simple Storage Service (Amazon S3) API endpoint. For example, if you're using a private subnet, confirm that the subnet has a VPC endpoint or a NAT gateway.
  • History server port: Confirm that you can access the subnet through the Spark history server port. For example, a firewall could be blocking the port.
  • Event log directory: Confirm that you entered a valid Amazon S3 path for the event log directory. You must use s3a:// for the event logs path scheme. If there are event log files in the Amazon S3 path that you specified, then the path is valid.

If you still get an error, check the following log groups in Amazon CloudWatch Logs:

  • /aws-glue/sparkui_cfn/cfn-init.log
  • /aws-glue/sparkui_cfn/spark_history_server.log

Note: The history server EC2 instance is terminated when the CloudFormation stack rolls back. To prevent the instance from being terminated, enable termination protection for the stack.

Docker

If you're using Docker to view the Spark UI and you can't connect to the Spark history server from your web browser, check the following:

  • Confirm that the AWS credentials (access key and secret key) are valid. If you want to use temporary credentials, you must use spark.hadoop.fs.s3a.session.token in the command. Example:
$ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS \
-Dspark.history.fs.logDirectory=s3a://path_to_eventlog \
-Dspark.hadoop.fs.s3a.access.key=AWS_ACCESS_KEY_ID
-Dspark.hadoop.fs.s3a.secret.key=AWS_SECRET_ACCESS_KEY \
-Dspark.hadoop.fs.s3a.session.token=SESSION_TOKEN \
-Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" \
-p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
  • Confirm that you entered a valid Amazon S3 path for the event log directory. You must use s3a:// for the event logs path scheme. If there are event log files in the Amazon S3 path that you specified, then the path is valid.
  • Confirm that you entered the correct port number in the browser. By default, the port number is 18080 (for example, http://localhost:18080). To change the port number, change the -p parameter in the command and the spark.history.ui.port parameter in the Dockerfile.

If you still can't view the Spark UI, then check the logs. To get the stdout and stderr logs for the Docker container, run docker run with the -it parameter instead of the -itd parameter. Example:

$ docker run -it -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS \
-Dspark.history.fs.logDirectory=s3a://path_to_eventlog \
-Dspark.hadoop.fs.s3a.access.key=AWS_ACCESS_KEY_ID
-Dspark.hadoop.fs.s3a.secret.key=AWS_SECRET_ACCESS_KEY \
-Dspark.hadoop.fs.s3a.session.token=SESSION_TOKEN \
-Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" \
-p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"