Why can't I see Amazon EMR cluster logs in an S3 bucket that has an attached policy enforcing SSE-KMS encryption?

Last updated: 2021-07-06

My Amazon EMR cluster logs aren't being archived to the Amazon Simple Storage Service (Amazon S3) bucket that I specified. The S3 bucket has an attached policy that enforces server-side encryption with AWS Key Management Service (SSE-KMS). Log writing fails with a 403 error similar to the following:

2020-01-15 04:01:25,247 INFO logspusher-6: Failed to upload 126 logs:
USE: /emr/instance-state/instance-state.log-2020-01-14-20-15.gz reason: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 8B99FE94D1678AAB)

Short description

When logging is enabled for an Amazon EMR cluster, the LogPusher service archives cluster logs to the specified S3 bucket. LogPusher uses AES-256 encryption, rather than SSE-KMS, to write logs. To write logs to an S3 bucket that has an SSE-KMS encryption policy, use the sync command to manually upload the files.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Resolution

Note: With Amazon EMR version 5.30.0 and later (except Amazon EMR 6.0.0), you can encrypt log files stored in Amazon S3 with an AWS KMS customer managed key.

1.    Connect to the master node using SSH.

2.    Find the log files that you want to copy. For example, step logs are stored at /mnt/var/log/hadoop/steps on the master node.

3.    To copy the log files to the S3 bucket, run the sync command with the --sse-kms-key-id field. Example:

aws s3 sync /mnt/var/log/hadoop/steps/ s3://awsexamplebucket/elasticmapreduce/${cluster_id}/steps/ --sse aws:kms --sse-kms-key-id 17246c74-6ff4-4adb-86e5-76f7f1603f00

You can use a cron job to automate the sync command. To configure the cron job, run a custom bootstrap action on all nodes when you launch an Amazon EMR cluster.