AWS Storage Blog
Maintaining object immutability by automatically extending Amazon S3 Object Lock retention periods
Protecting against accidental or malicious deletion is a key element of data protection. Immutability protects data in-place, preventing unintended changes or deletions. However, sometimes it isn’t clear for how long data should be made immutable. Users in this situation are looking for a solution that maintains short-term immutability, indefinitely. They want to make sure their data is always protected, without any long-term commitment.
Amazon S3 Object Lock is the industry standard for object storage immutability for ransomware protection, and enterprises around the world use it to keep their data safe. When an object in Amazon S3 is locked in compliance mode, users can’t change the retention mode or shorten the retention period. Compliance mode blocks permanent object deletion during a customer-defined retention period. A user might set compliance mode for one year for certain objects, but determine these objects are still valuable as they approach their Retain until date. Users often want to extend S3 Object Lock settings of current versions of objects, while letting the noncurrent versions expire so that they are available for permanent deletion.
In this post, we present an AWS CloudFormation-based solution that automatically extends S3 Object Lock retention periods, at scale. By continually extending the retention periods of all current versions of objects stored in an Amazon S3 bucket, objects cannot be permanently deleted until they become noncurrent and then a defined minimum retention period has passed.
Solution overview
The solution runs weekly and makes sure that all current versions of objects in the protected bucket (or specified prefix) are immutable for at least the defined ‘minimum retention period.’
To implement this solution, you need an AWS account and an Amazon S3 general purpose bucket with S3 Object Lock enabled (known as the ‘protected bucket’). S3 Object Lock requires S3 Versioning to be enabled on your bucket. When an S3 bucket has S3 Versioning enabled, simple delete requests (which place a delete marker) or writes to the same key (object name) cause the original object to become a noncurrent version. You can use S3 Lifecycle to set specific rules around your version count or noncurrent version life span.
Deploying this solution applies compliance mode to all current versions of objects in scope, even if they currently have no lock, or have governance mode applied. With compliance mode, S3 Object Lock blocks permanent object deletion during the defined retention period. Consider the implications of this, and make sure the solution meets your needs prior to implementing in a production environment.
How it works
- Weekly Amazon S3 Inventory report creation launches Athena (through S3 Event Notification and AWS Lambda).
- Amazon Athena performs a query to output objects that need to have their retention extended so that they don’t fall below the minimum retention period in the coming week (hence the ‘+7’).
- The Athena query outputs a CSV manifest of all object versions requiring S3 Object Lock extension. This triggers an Amazon S3 Batch Operations job (again through S3 Event Notification and Lambda).
- S3 Batch Operations performs a PutObjectRetention operation on every object in the manifest, setting their retention date to
today + minimum retention period + extension frequency
.
Solution deployment
Download this CloudFormation template. Create a CloudFormation stack using this template in the same AWS Region as your S3 bucket. The stack parameters, shown in the following figure, include:
- Your bucket name: (“my-protected-bucket”)
- Prefix (optional): (“important-objects/”)
- Minimum Object Lock retention period: (90)
- Extension frequency: (28)
During deployment, the stack uses an AWS Lambda function to check that the protected bucket has S3 Object Lock enabled. If it does not, then deployment fails. Then, it creates a uniquely-named S3 bucket, Amazon Athena workspace, and all the needed S3 Event Notifications, Lambda functions, and AWS Identity and Access Management (IAM) roles that are needed. View the Resources tab of the deployed CloudFormation stack to see all of the deployed resources.
The stack deploys in around five minutes, and it operates automatically as soon as the first S3 Inventory is delivered. It might take up to 48 hours to deliver the first Inventory. This solution does not use any existing S3 Inventory configurations, as it needs a specific output format and location. It is in addition to, and does not interfere with, any other S3 Inventory configurations.
You may have different data sets, with different compliance requirements, in prefixes within the same bucket. In this case, simply deploy the solution multiple times – once for each prefix.
As it runs weekly, this solution does not apply immediately to newly-created objects. If this is a requirement, then configure the bucket’s default S3 Object Lock retention settings to compliance mode with a duration of at least a week. This solution extends the Retain until date on each object in scope, making sure the objects are always protected. Be aware that alternative S3 Object Lock retention settings may be applied to new objects as they are written, unless denied by a bucket policy as in this example.
Enforcing a data retention period
To automatically and permanently delete noncurrent versions of objects after the needed retention period, configure a NoncurrentVersionExpiration S3 Lifecycle rule on the protected bucket. This only acts on objects after the specified number of days has passed after the object version becomes noncurrent – either by a new version of the object being created, or a delete marker being placed.
S3 Lifecycle expiration rules can’t delete objects that are locked until the retention date has passed. Configuring this rule permanently deletes noncurrent versions of objects only after the specified days have passed and after their retention period expires. This makes sure that data is retained for a precisely defined period after it is deleted or overwritten.
Monitoring the solution
You can find the status of the S3 Batch Operations Jobs triggered by the solution in the S3 Batch Operations console.
You can filter the Job description as it includes the protected bucket name. In addition, the jobs are tagged so that you can easily identify them:
- Key: job-created-by
- Value: Auto Extend Object Lock Solution
Note there will be one failed task in each S3 Batch Operations job. This is because the CSV file generated from Athena contains a header row and S3 Batch Operations does not support header rows on manifests. You can ignore this failed task. S3 Inventory reports, Athena outputs, and S3 Batch Operations reports are stored in the solution bucket. These are your records of compliance with your defined minimum retention period. To minimize storage charges, you may wish to use lifecycle rules to expire these objects after a specified time, and/or transition larger objects to one of the S3 Glacier storage classes (if they are being retained at least 90 days).
‘Report only’ mode
To run the solution as far as preparing the S3 Batch Operations job, without automatically starting the job to apply S3 Object Lock retentions, perform the following steps immediately after deploying the CloudFormation stack:
- Open the Lambda function
<stack name>-ExtendObjLockS3BatchJobLambda-<UID>
. If you need to identify the Lambda UID, then go to the CloudFormation console, select the stack, and choose Resources. - In the Code tab find the line
'ConfirmationRequired': False
and replaceFalse
withTrue
. - Deploy the updated code.
Once the first inventory has been generated, the solution prepares the S3 Batch Operations job. Due to the preceding change, the job stays in a suspended state until confirmed manually. Jobs in a suspended state fail after 30 days. Review the job configuration and manifest to validate the operations are what you intended. You can choose to manually run the job in the Amazon S3 console, or revert the preceding change in Lambda and wait until the next S3 Inventory is generated.
Operating at scale
In testing, we found that extending all 1 billion current version objects in a prefix completed remarkably quickly:
- Athena processing inventory (and outputting a 66 GB .csv file!): 14 minutes.
- S3 Batch Operations
- ‘Preparing’ phase: Approximately eight hours.
- ‘Active’ phase (applying S3 Object Locks): Approximately 24 hours.
This shows that the solution scales to even the largest S3 buckets. Note that as the default Athena DML query timeout is 30 minutes, if your bucket has more than around 2 billion objects, then you may need to take one or both of the following actions:
- Request an increased DML query timeout in Athena service quotas.
- Deploy the CloudFormation template multiple times, specifying different prefixes.
Changing the configuration after deployment
You can amend the prefix, minimum retention period, and extension frequency after the solution has been deployed by updating the CloudFormation stack. Do not change the protected S3 bucket, as this is not supported.
Charges
PUT operations for S3 Object Lock extensions are charged as an S3 Standard storage class PUT request, regardless of the storage class the object is in. The only other notable charges for the solution come from S3 Batch Operations.
As an example, if we deploy the solution against a bucket or prefix containing 1 million objects, with an extension frequency of 28 days, then the estimated total cost would be $7.63 per month based on us-east-1 Region pricing.
The extension frequency determines the cost of the solution. For example, if your requirements support an extension frequency of 280 days, each object could have its lock extended every 280 days instead of every 28 days. Charges for the PUT requests and the ‘per object’ costs for S3 Batch Operations would be reduced 10x. Adjust the extension frequency to meet your needs (using a multiple of seven as the solution runs weekly). The minimum retention period does not affect the solution cost.
Deployment example 1: 90-day minimum retention period
You want to make sure that all current versions of objects are always immutable for at least 90 days into the future. You specify a minimum retention period of 90 days, and an extension frequency of 28 days.
An example object has a remaining retention period of 91 days when a weekly inventory is generated. If this object is not extended, by the next week it has a remaining retention period of only 84 days, which would be less than the specified minimum retention period. Therefore, the solution extends the retention period of the object 118 (90+28) days in the future, and a further extension is not needed for another 28 days.
Deployment example 2: Maintaining immutability without a minimum retention period
You do not have a specific minimum retention requirement, and simply want to make sure that all current versions of objects remain immutable. You want to minimize costs, and are comfortable with objects having retention dates up to a year in the future. You specify a minimum retention period of 1 (one) day, and an extension frequency of 364 days.
An example object has a remaining retention period of five days when a weekly inventory is generated. If this object is not extended, then by the next week it is no longer immutable. Therefore, the solution extends the retention period of the object to 364 days in the future.
Cleaning up
As the solution is deployed using CloudFormation, you can remove it by deleting the CloudFormation stack. Deleting the CloudFormation stack does not affect the S3 Object Lock retention period applied to any objects.
You must delete the ‘solution’ S3 bucket (created to store inventories, Athena output and reports) manually. To identify the solution bucket, go to the CloudFormation console, select the stack, choose Resources, locate ExtendObjLockInventoryBucket, and select the link to open a new tab displaying the solution bucket. To empty the bucket, you can apply a lifecycle rule or use the Empty bucket button in the S3 console. Once the S3 bucket is empty, proceed to delete it.
Conclusion
Data immutability is a core aspect of data protection planning. Sometimes it isn’t clear for how long data should be made immutable, as older data can be just as valuable as new data. In this post, we presented a solution to set and maintain immutability for current versions of objects in Amazon S3, indefinitely.
Ensuring ongoing immutability for existing data in your S3 bucket or prefix can help fulfill compliance requirements and prevent data from being permanently deleted. This solution helps to ensure that users cannot permanently delete object versions, either manually or through S3 Lifecycle, until after a new version of the object is written, or a delete marker is placed, and then your chosen minimum retention period has passed.
To get started, download the CloudFormation template. If you need to enable S3 Object Lock for an existing bucket, this is now possible without engaging AWS support. To learn more about S3 Object Lock, visit the documentation or product page. At AWS we thrive on user feedback, and look forward to your thoughts and comments.