Posted On: Nov 6, 2019
You can now debug and monitor your Apache Spark jobs by logging directly into the off-cluster, persistent, Apache Spark History Server using the EMR Console.
The Spark History Server is an extension of the Apache Spark Web User Interface (UI). It presents a visual interface with detailed information about completed and running Spark jobs on a cluster. You can dive into job-specific metrics, and information about scheduler stages, tasks, and running executors.
Amazon EMR now persists the Spark History Server along with the event and container logs outside the cluster and independently of the cluster’s life cycle. This allows you to access and use the Spark History Server for terminated and running clusters alike. In addition, you can access the Spark History Server directly from the console, and no longer need to run through complex steps to view it as a web interface.
The feature is available when using EMR Version 5.25 and later and is available in the US East (N. Virginia and Ohio), US West (N. California and Oregon), Canada (Central), EU (Frankfurt, Ireland, and London), and Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo) Regions.