Why can't I see Amazon EMR cluster logs in an S3 bucket that has an attached policy enforcing SSE-KMS encryption?

Last updated: 2020-01-20

My Amazon EMR cluster logs aren't being archived to the Amazon Simple Storage Service (Amazon S3) bucket that I specified. The S3 bucket has an attached policy that enforces server-side encryption with AWS Key Management Service (SSE-KMS). Log writing fails with a 403 error like this:

2020-01-15 04:01:25,247 INFO logspusher-6: Failed to upload 126 logs:
USE: /emr/instance-state/instance-state.log-2020-01-14-20-15.gz reason: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 8B99FE94D1678AAB)

Short Description

When logging is enabled for an Amazon EMR cluster, the LogPusher service archives cluster logs to the specified S3 bucket. LogPusher uses AES-256 encryption, rather than SSE-KMS, to write logs. To write logs to an S3 bucket that has an SSE-KMS encryption policy, use the sync command to manually upload the files.

Resolution

1.    Connect to the master node using SSH.

2.    Find the log files that you want to copy. For example, step logs are stored at /mnt/var/log/hadoop/steps on the master node.

3.    To copy the log files to the S3 bucket, run the sync command with the --sse-kms-key-id field. Example:

aws s3 sync /mnt/var/log/hadoop/steps/ s3://awsexamplebucket/elasticmapreduce/${cluster_id}/steps/ --sse aws:kms --sse-kms-key-id 17246c74-6ff4-4adb-86e5-76f7f1603f00

You can use a cron job to automate the sync command. To configure the cron job, run a custom bootstrap action on all nodes when you launch an Amazon EMR cluster.