Posted On: Aug 15, 2023
Amazon EMR Serverless is a serverless option for Amazon EMR that makes it simple for data analysts and engineers to run open-source big data analytics frameworks such as Apache Spark and Apache Hive without configuring, managing, and scaling clusters or servers. Starting today, you can specify fine-grained log configurations for your driver and executor logs, making it simple to troubleshoot your Apache Spark jobs.
Developers often need to analyze logs to gain in-depth insights on their jobs for effective monitoring and debugging. However, Spark's default log settings can be sometimes too verbose, making it difficult to find relevant log entries. Spark uses Log4j2 to configure logs. With this feature, you can specify custom Log4j2 settings for your Spark driver and executor logs for each EMR Serverless job run. For example, you can set Spark's default log level to 'ERROR' to get minimal logs for Spark, your code's log level to 'INFO' to get detailed logs for your code, and the log level for libraries that you want to debug to 'DEBUG' to get even more detailed logs for those, allowing you to analyze logs better to provide meaningful insights.
This feature is available for EMR release versions 6.8.0 and above in all regions where Amazon EMR Serverless is available. To learn more, visit the Configuring Log4j2 page.