Posted On: Feb 23, 2022

AWS Glue now provides job run insights, a feature that reduces Apache Spark job development time by helping determine sources of errors and performance bottlenecks. AWS Glue is a data integration service that lets customers discover, prepare, and combine data for analytics using serverless Apache Spark and Python. Spark’s distributed processing and “lazy execution” model makes it hard and time-consuming for Data Engineers to diagnose errors and tune performance. With this launch, AWS Glue gives you automated analysis and interpretation of errors in your Spark jobs to make the process faster.

Job run insights simplifies root cause analysis on job run failures and flattens the learning curve for both AWS Glue and Apache Spark. It identifies the line number in your code where the failure occurred and provides details on what the AWS Glue engine was doing at the time of the error. It also interprets errors for you and provides recommendations on how to tune your jobs and code to fix issues as well as improve performance. This feature augments the Spark UI logs and CloudWatch logs and metrics that AWS Glue provided previously.

This feature is available in the same AWS Regions as AWS Glue.

To learn more, visit our documentation or view a job run in AWS Glue Studio’s job monitoring dashboard.