How to retroactively encrypt existing objects in Amazon S3 using S3 Inventory, Amazon Athena, and S3 Batch Operations
November 1, 2021: AWS KMS is replacing the term customer master key (CMK) with AWS KMS key and KMS key. The concept has not changed. To prevent breaking changes, AWS KMS is keeping some variations of this term. More info.
Amazon Simple Storage Service (S3) is an object storage service that offers industry-leading scalability, performance, security, and data availability. With Amazon S3, you can choose from three different server-side encryption configurations when uploading objects:
- SSE-S3 – uses Amazon S3-managed encryption keys
- SSE-KMS – uses AWS KMS keys (KMS keys) stored in AWS Key Management Service (KMS)
- SSE-C – uses root keys provided by the customer in each PUT or GET request
These options allow you to choose the right encryption method for the job. But as your organization evolves and new requirements arise, you might find that you need to change the encryption configuration for all objects. For example, you might be required to use SSE-KMS instead of SSE-S3 because you need more control over the lifecycle and permissions of the encryption keys in order to meet compliance goals.
You could change the settings on your buckets to use SSE-KMS rather than SSE-S3, but the switch only impacts newly uploaded objects, not objects that existed in the buckets before the change in encryption settings. Manually re-encrypting older objects under KMS keys in KMS may be time-prohibitive depending on how many objects there are. Automating this effort is possible using the right combination of features in AWS services.
In this post, I’ll show you how to use Amazon S3 Inventory, Amazon Athena, and Amazon S3 Batch Operations to provide insights on the encryption status of objects in S3 and to remediate incorrectly encrypted objects in a massively scalable, resilient, and cost-effective way. The solution uses a similar approach to the one mentioned in this blog post, but it has been designed with automation and multi-bucket scalability in mind. Tags are used to target individual noncompliant buckets in an account, and any encrypted (or unencrypted) object can be re-encrypted using SSE-S3 or SSE-KMS. Versioned buckets are also supported, and the solution operates on a regional level.
Note: You can’t re-encrypt to or from objects encrypted under SSE-C. This is because the root key material must be provided during the PUT or GET request, and cannot be provided as a parameter for S3 Batch Operations.
Moreover, the entire solution can be deployed in under 5 minutes using AWS CloudFormation. Simply tag your buckets targeted for encryption, upload the solution artifacts into S3, and deploy the artifact template through the CloudFormation console. In the following sections, you will see that the architecture has been built to be easy to use and operate, while at the same time containing a large number of customizable features for more advanced users.
At a high level, the core features of the architecture consist of 3 services interacting with one another: S3 Inventory reports (1) are delivered for targeted buckets, the report delivery events trigger an AWS Lambda function (2), and the Lambda function then executes S3 Batch (3) jobs using the reports as input to encrypt targeted buckets. Figure 1 below and the remainder of this section provide a more detailed look at what is happening underneath the surface. If this is not of high interest for you, feel free to skip ahead to the Prerequisites and Solution Deployment sections.
Here’s a detailed overview of how the solution works, as shown in Figure 1 above:
- When the CloudFormation template is first launched, a number of resources are created, including:
- An S3 bucket to store the S3 Inventory reports
- An S3 bucket to store S3 Batch Job completion reports
- A CloudWatch event that is triggered by changes to tags on S3 buckets
- An AWS Glue Database and AWS Glue Tables that can be used by Athena to query S3 Inventory and S3 Batch report findings
- A Lambda function that is used as a Custom Resource during template launch, and afterwards as a target for S3 event notifications and CloudWatch events
- During deployment of the CloudFormation template, a Lambda-backed Custom Resource lists all S3 buckets within the AWS Region specified and checks to see if any has a configurable tag present (configured via an AWS CloudFormation parameter). When a bucket with the specified tag is discovered, the Lambda configures an S3 Inventory report for the discovered bucket to be delivered to the newly-created central report destination bucket.
- When a new S3 Inventory report arrives into the central report destination bucket (which can take between 1-2 days) from any of the tagged buckets, an S3 Event Notification triggers the Lambda to process it.
- The Lambda function first adds the path of the report CSV file as a partition to the AWS Glue table. This means that as each bucket delivers its report, it becomes instantly queryable by Athena, and any queries executed return the most recent information available on the status of the S3 buckets in the account.
- The Lambda function then checks the value of the EncryptBuckets parameter in the CloudFormation launch template to assess whether any re-encryption action should be taken. If it is set to yes, the Lambda function creates an S3 Batch job and executes it. The job takes each object listed in the manifest report and copies it over in the exact same location. When the copy occurs, SSE-KMS or SSE-S3 encryption is specified in the job parameters, effectively re-encrypting properly all identified objects.
- Once the batch job finishes for the S3 Inventory report, a completion report is sent to the central batch job report bucket. The CloudFormation template provides a parameter that controls the option to include either all successfully processed objects or only objects that were unsuccessfully processed. These reports can also be queried with Athena, since the reports are also added as partitions to the AWS Glue batch reports tables as they arrive.
To follow along with the sample deployment, your AWS Identity and Access Management (IAM) principal (user or role) needs administrator access or equivalent.
For this walkthrough, the solution will be configured to encrypt objects using SSE-KMS, rather than SSE-S3, when an inventory report is delivered for a bucket. Please note that the key policy of the KMS key will be automatically updated by the custom resource during launch to allow S3 to use it to encrypt inventory reports. No key policies are changed if SSE-S3 encryption is selected instead. The configuration in this walkthrough also adds a tag to all newly encrypted objects. You’ll learn how to use this tag to restrict access to unencrypted objects in versioned buckets. I’ll make callouts throughout the deployment guide for when you can choose a different configuration from what is deployed in this post.
To deploy the solution architecture and validate its functionality, you’ll perform five steps:
- Tag target buckets for encryption
- Deploy the CloudFormation template
- Validate delivery of S3 Inventory reports
- Confirm that reports are queryable with Athena
- Validate that objects are correctly encrypted
If you are only interested in deploying the solution and encrypting your existing environment, Steps 1 and 2 are all that are required to be completed. Steps 3 through 5 are optional on the other hand, and outline procedures that you would perform to validate the solution’s functionality. They are primarily for users who are looking to dive deep and take advantage of all of the features available.
With that being said, let’s get started with deploying the architecture!
Step 1: Tag target buckets
Navigate to the Amazon S3 console and identify which buckets should be targeted for inventorying and encryption. For each identified bucket, tag it with a designated key value pair by selecting Properties > Tags > Add tag. This demo uses the tag __Inventory: true and tags only one bucket called adams-lambda-functions, as shown in Figure 2.
Step 2: Deploy the CloudFormation template
- Download the S3 encryption solution. There will be two files that make up the backbone of the solution:
- encrypt.py, which contains the Lambda microservices logic;
- deploy.yml, which is the CloudFormation template that deploys the solution.
- Zip the file encrypt.py, rename it to encrypt.zip, and then upload it into any S3 bucket that is in the same Region as the one in which the CloudFormation template will be deployed. Your bucket should look like Figure 3:
- Navigate to the CloudFormation console and then create the CloudFormation stack using the deploy.yml template. For more information, see Getting Started with AWS CloudFormation in the CloudFormation User Guide. Figure 4 shows the parameters used to achieve the configuration specified for this walkthrough, with the fields outlined in red requiring input. You can choose your own configuration by altering the appropriate parameters if the ones specified do not fit your use case.
Step 3: Validate delivery of S3 Inventory reports
After you’ve successfully deployed the CloudFormation template, select any of your tagged S3 buckets and check that it now has an S3 Inventory report configuration. To do this, navigate to the S3 console, select a tagged bucket, select the Management tab, and then select Inventory, as shown in Figure 5. You should see that an inventory configuration exists. An inventory report will be delivered automatically to this bucket within 1 to 2 days, depending on the number of objects in the bucket. Make a note of the name of the bucket where the inventory report will be delivered. The bucket is given a semi-random name during creation through the CloudFormation template, so making a note of this will help you find the bucket more easily when you check for report delivery later.
Step 4: Confirm that reports are queryable with Athena
- After 1 to 2 days, navigate to the inventory reports destination bucket and confirm that reports have been delivered for buckets with the __Inventory: true tag. As shown in Figure 6, a report has been delivered for the adams-lambda-functions bucket.
- Next, navigate to the Athena console and select the AWS Glue database that contains the table holding the schema and partition locations for all of your reports. If you used the default values for the parameters when you launched the CloudFormation stack, the AWS Glue database will be named s3_inventory_database, and the table will be named s3_inventory_table. Run the following query in Athena:
The outputs of the query will be a snapshot aggregate count of objects in the categories of SSE-S3, SSE-C, SSE-KMS, or NOT-SSE across your tagged bucket environment, before encryption took place, as shown in Figure 7.
From the query results, you can see that the adams-lambda-functions bucket had only two items in it, both of which were unencrypted. At this point, you can choose to perform any other analytics with Athena on the delivered inventory reports.
Step 5: Validate that objects are correctly encrypted
- Navigate to any of your target buckets in Amazon S3 and check the encryption status of a few sample objects by selecting the Properties tab of each object. The objects should now be encrypted using the specified KMS key. Because you set the AddTagToEncryptedObjects parameter to yes during the CloudFormation stack launch, these objects should also have the __ObjectEncrypted: true tag present. As an example, Figure 8 shows the rules_present_rule.zip object from the adams-lambda-functions bucket. This object has been properly encrypted using the correct KMS key, which has an alias of blog in this example, and it has been tagged with the specified key value pair.
- For further validation, navigate back to the Athena console and select the s3_batch_table from the s3_inventory_database, assuming that you left the default names unchanged. Then, run the following query:
If encryption was successful, this query should result in zero items being returned because the solution by default only delivers S3 batch job completion reports on items that failed to copy. After validating by inspecting both the objects themselves and the batch completion reports, you can now safely say that the contents of the targeted S3 buckets are correctly encrypted.
Congratulations! You’ve successfully deployed and operated a solution for rectifying S3 buckets with incorrectly encrypted and unencrypted objects. The architecture is massively scalable because it uses S3 Batch Operations and Lambda, it’s fully serverless, and it’s cost effective to run.
Please note that if you selected no for the EncryptBuckets parameter during the initial launch of the CloudFormation template, you can retroactively perform encryption on targeted buckets by simply doing a stack update. During the stack update, switch the EncryptBuckets parameter to yes, and proceed with deployment as normal. The update will reconfigure S3 inventory reports for all target S3 buckets to get the most up-to-date inventory. After the reports are delivered, encryption will proceed as desired.
Moreover, with the solution deployed, you can target new buckets for encryption just by adding the __Inventory: true tag. CloudWatch Events will register the tagging action and automatically configure an S3 Inventory report to be delivered for the newly tagged bucket.
Finally, now that your S3 buckets are properly encrypted, you should take a few more manual steps to help maintain your newfound account hygiene:
- Perform remediation on unencrypted objects that may have failed to copy during the S3 Batch Operations job. The most common reason that objects fail to copy is when object size exceeds 5 GiB. S3 Batch Operations uses the standard CopyObject API call underneath the surface, but this API call can only handle objects less than 5 GiB in size. To successfully copy these objects, you can modify the solution you learned in this post to launch an S3 Batch Operations job that invokes Lambda functions. In the Lambda function logic, you can make CreateMultipartUpload API calls on objects that failed with a standard copy. The original batch job completion reports provide detail on exactly which objects failed to encrypt due to size.
- Prohibit the retrieval of unencrypted object versions for buckets that had versioning enabled. When the object is copied over itself during the encryption process, the old unencrypted version of the object still exists. This is where the option in the solution to specify a tag on all newly encrypted objects becomes useful—you can now use that tag to draft a bucket policy that prohibits the retrieval of old unencrypted objects in your versioned buckets. For the solution that you deployed in this post, such a policy would look like this:
- Update bucket policies to prevent the upload of unencrypted or incorrectly encrypted objects. By updating bucket policies, you help ensure that in the future, newly uploaded objects will be correctly encrypted, which will help maintain account hygiene. The S3 encryption solution presented here is meant to be a onetime-use remediation tool, while you should view updating bucket policies as a preventative action. Proper use of bucket policies will help ensure that the S3 encryption solution is not needed again, unless another encryption requirement change occurs in the future. To learn more, see How to Prevent Uploads of Unencrypted Objects to Amazon S3.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon S3 forum.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.