How do I reduce the amount of logs generated by my AWS Glue job?

3 minute read

My AWS Glue job is generating too many logs in Amazon CloudWatch. I want to reduce the number of logs generated.

Resolution

With AWS Glue-Spark ETL jobs, you can't control the verbosity of the logs generated by the instances that the AWS Glue jobs run on. The logs are verbose so that they can be used to monitor internal failures and help in diagnosing job failures. However, you can define the Spark logging levels by following the steps presented here.

Choose the standard filter setting for continuous logging

If you turned on continuous logging for your job, choose the Standard filter for Log filtering option. This filter can help you prune the non-useful Apache Spark driver/executor and Apache Hadoop YARN heartbeat log messages. To change the log filter setting for your AWS Glue job, do the following:

Open the AWS Glue console.
In the navigation pane, choose Jobs.
Select the job that you want to update.
Choose Action, and then choose Edit job.
Expand the Monitoring options section.
Select Continuous logging.
Under Log filtering, select Standard filter.
Choose Save.

To change this setting from the AWS Command Line Interface (AWS CLI), use the following command:

'--enable-continuous-cloudwatch-log': 'true'
'--enable-continuous-log-filter': 'true'

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

For more information, see Enabling continuous logging for AWS Glue jobs.

Important: Even with the standard filter setting, the application master logs for the Spark jobs are still pushed to /aws-glue/jobs/output and /aws-glue/jobs/error log groups.

Set the logging level using Spark context method setLogLevel

You can set the logging level for your job using the setLogLevel method of pyspark.context.SparkContext. Valid logging levels include ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, and WARN. For more information, see Spark documentation for setLogLevel.

Use the following code to import the Spark context method and set the logging level for your job:

from pyspark.context import SparkContext
sc = SparkContext()
sc.setLogLevel("new-log-level")

Note: Be sure to replace new-log-level with the logging level that you want to set for your job.

For more information, see Spark documentation for Configuring logging.

Use a custom log4j.properties file to define the logging level

Spark uses log4j configurations for logging. You can include the logging preferences in a log4j.properties file, upload the file to Amazon Simple Storage Service (Amazon S3), and use the file in the AWS Glue job.

To reference the Amazon S3 file in the job, do the following:

Open the AWS Glue console.
In the navigation pane, choose Jobs.
Select the job where you want to reference the file.
Choose Actions, and then choose Edit job.
Expand the Security configuration, script libraries, and job parameters (optional) section.
For Referenced files path, paste the full S3 path where you stored the log4j.properties file.

For more information, see Providing your own custom scripts.

Related information

Monitoring with Amazon CloudWatch

Topics

Analytics

Relevant content

Logs generated by Lambda Functions
rePost-User-3337950
asked 2 years ago
export of the glue jobs
CYN
asked a month ago
AWS Glue Job - Extract the Job related metadata
Accepted Answer
SMR
asked 9 months ago
How to reduce latency of auto-scaling instances triggered by AWS batch jobs
Accepted Answer
Adam
asked 2 months ago
How do I get the CDK generated by "dotnet aws deploy"?
timmattison
asked 2 years ago
Why does my AWS Glue ETL job fail with the error "Container killed by YARN for exceeding memory limits"?
AWS OFFICIALUpdated 2 years ago
How can I trigger an AWS Glue job in one AWS account based on the status of an AWS Glue job in another account?
AWS OFFICIALUpdated a year ago
How can I troubleshoot problems with viewing the Spark UI for AWS Glue ETL jobs?
AWS OFFICIALUpdated 3 years ago
How do I resolve the "No space left on device" error in an AWS Glue ETL job?
AWS OFFICIALUpdated a year ago
How to view consolidated log from multiple log streams generated from an AWS Mainframe Modernization application?
EXPERT
Souma Suvra Ghosh
published 3 months ago