AWS Storage Blog

How to manage retention periods in bulk using Amazon S3 Batch Operations

Update (12/11/2023): As of November 20, 2023, Amazon S3 supports enabling S3 Object Lock on existing buckets.


Amazon S3 Batch Operations now supports S3 Object Lock. In this post, I share ways you can use these two Amazon S3 features together to address common data protection needs. S3 Batch Operations is a feature that lets you perform repetitive or bulk actions like copying or updating tag sets across millions of objects with a single request. All you provide is the list of objects, and S3 Batch Operations handles all the manual work, including managing retries and displaying progress. To learn more about S3 Batch Operations, check out this blog post.

With S3 Object Lock, you can apply retention dates and legal holds to your objects, preventing them from being deleted or overwritten indefinitely or until a particular date has passed. S3 Batch Operations support for S3 Object Lock helps you meet regulatory requirements for write once read many (WORM) storage. In addition, it simply adds another layer of protection from object changes and deletions.

The basics

Amazon S3 Object Lock provides two ways to manage object retention.

  • A retention period specifies a fixed period of time during which an object remains locked. During this period, your object is WORM-protected and the object version cannot be deleted or changed. S3 Object Lock provides two retention period modes:
    • In governance mode, you protect objects against being deleted by most users, but you can still grant some user permission to alter retention settings or delete objects if necessary.
    • In compliance mode, a protected object version cannot be overwritten or deleted by any user, including the root user in your AWS account.
  • A legal hold provides the same protection as a retention period, but it has no expiration date. Instead, a legal hold remains in place until you explicitly remove it.

An object version can have either a retention period or a legal hold, or a combination of both. For example, you may have an object with a 1-year retention date plus a legal hold. Use retention periods if you know the exact date for retention, and select the retention mode that aligns with your requirements.

When should I use S3 Batch Operations support for S3 Object Lock?

If you must apply, update, or remove S3 Object Lock settings to a large number of objects in a bucket, consider using S3 Batch Operations support for S3 Object Lock. If you are using S3 Object Lock for the first time, S3 Batch Operations support for S3 Object Lock is a simple way to make those changes. This is also the case if your existing S3 Object Lock requirements have changed and you must update, add, or remove a lock from a large number of objects. You may want to assign S3 Object Lock policies automatically to objects when they are added to your bucket. You can do this by setting up a default retention mode on that bucket, with no need to use S3 Batch Operations. For more details on this process, see this blog on protecting data with Amazon S3 Object Lock. Ok, let us now walk through the setup process.

Setup prerequisites

To follow along with the process outlined in this post, you need a set of objects with an S3 Object Lock retention period you want to extend, that’s it! This post is relevant whether you have petabytes of existing storage or are planning a migration to the cloud. You might also find the existing S3 Batch Operations documentation useful, including basics on Amazon S3 Batch Operations jobs, operations, and managing S3 Batch Operations jobs.

Note: Versioning and S3 Object Lock must be configured on the bucket where the job is performed. You can only enable Object Lock for new buckets. If you want to follow this exercise, but your existing bucket does not have S3 Object Lock turned on, you must first contact AWS Support to do so. For more details, see Object Lock bucket configuration.

Getting started: Deciding how you want to use S3 Object Lock

Each organization is different so you want to customize the following details to fit your specific use of Amazon S3 Object Lock. For this sample exercise, I am setting out to extend the governance mode retention period of all objects related to a specific project. The project ‘keyproject’ has been extended, which requires updating the retention configuration to 1 year from today for all objects in the manifest. Although I am using governance mode retention for this example, to apply compliance mode you can follow this same example and select compliance mode while creating the job instead. Be aware that the only way to delete objects with an S3 Object Lock compliance mode configuration is by closing the AWS account that they are associated with. For more information on types of holds and how to use them, see Amazon S3 Object Lock overview.

Three steps to extend the retention of objects using S3 Batch Operations support for S3 Object Lock

  1. Specifying a manifest. A manifest is the Amazon S3 object that lists the object keys that you want Amazon S3 to act upon.
  2. Create the S3 Object Lock job, in this step I select the operation for S3 Batch Operations to execute and identify any required permissions to run the job.
  3. Run the job and have all your objects locked until your chosen date!

Specifying a manifest

For this example, I am extending the retention period for a set of objects saved on my bucket under the prefix ‘keyproject.’ Running an S3 Inventory generates a manifest.json file, a manifest.checksum file, and the inventory report; I am using the manifest.json file as input for the S3 Batch Operations job.

To run an S3 Inventory report I go to the Management tab while on my bucket, select Inventory, and then select Add new. I enter a name for the S3 Inventory, select the destination bucket for the report, apply any optional filters by prefix, and select daily for frequency. The objects I must extend the retention period on are located on the prefix ‘keyproject’, filtering this prefix ensures that the manifest only includes objects for this project. Additionally, I applied a prefix for the report on the destination bucket to easily find it in the future. For more information, see configuring an S3 Inventory.

Note: You can specify a manifest in a create job request using an Amazon S3 Inventory report or a CSV file. As an alternative to an inventory report, you can create a CSV file with your object list, for more details see specifying a manifest.

Configuring an S3 Inventory

Creating an S3 Batch Operations job to extend your retention period

Ok, now that I have my manifest, I can use the S3 Console to create and run the job. You can use S3 Batch Operations through the AWS Management Console, AWS CLI, or AWS SDKs. In the S3 console, go to Batch Operations and select Create job. Select the Region where the objects referenced in the manifest are located. My project is located in a us-east-1 bucket. I select S3 Inventory report as the format for the manifest, and browse through my S3 buckets to find the manifest.json file, published to the following location in the destination bucket: destination-prefix/source-bucket/config-ID/YYYY-MM-DDTHH-MMZ/manifest.json. Load the manifest.json file and use the Manifest object ETag to confirm that the correct manifest has been selected and select Next to proceed. For more details, see creating an S3 Batch Operations job.

Choose Region and manifest

Next, I choose the operation (Object Lock retention) and the options for this operation. I am enabling S3 Object Lock with a governance retention mode and a new retention date of 2021/05/06 (the console validates that this is within 365 days). I then choose Next.

Choose operation - Object Lock retention

I enter a name for my job, set its priority, and request a completion report that encompasses all tasks. Then I choose a bucket for the report and select an IAM Role that grants the necessary permissions. The console displays a role policy and a trust policy that I can copy and use by selecting View IAM role policy template and IAM trust policy. An example of the policy used in this exercise can be found at the bottom of this post. Afterward, I select Next.

Note: If you selected All Tasks as the scope of your completion report, you receive both successful and failed tasks. If you are only interested in failed tasks, I recommend you select Failed tasks only as the scope for your completion report.

Additional options - completion report - permissions - job tags

Finally, I review my job validating my Region and manifest, the retention mode and retain until date values, and any additional options. When I am finished reviewing, I select Create job.

Note: This is especially important when applying S3 Object Lock with compliance mode, which is immutable until its retention period has passed. You cannot delete that object or the bucket. The only way to delete objects that have this S3 Object Lock configuration is by closing the AWS account that they are associated with.

Finally, I review my job validating my Region and manifest, the retention mode and retain until date values, and any additional options (2)

 Running the S3 Batch Operations job

If the Create job request succeeds, Amazon S3 returns a job ID. The job ID is a unique identifier that Amazon S3 generates automatically so that you can identify your S3 Batch Operations job and monitor its status. The job then enters the Preparing state. When the job changes to Awaiting your confirmation to run, you can review the total number of objects for the job. After confirming this number, you then select the job and choose Run job.

Note: When you create a job through the AWS Management Console, you must review the job details and confirm that you want to run it before S3 Batch Operations can begin to process it. If a job remains in the suspended state for over 30 days, it fails.

Running the S3 Batch Operations job

As the job is running, S3 Batch Operations examines and monitors the overall failure rate, and stops the job if the rate exceeds 50%. You can use the console to track completion percentage, total number of failures, and failure rate. The most common reasons for a job failure are lack of permissions to read from the manifest bucket, or to write to the report bucket. Make sure you have set up the right permissions for running the job, for more details see granting permissions for Amazon S3 Batch Operations.

When the job is finished, it enters the Complete state. Congratulations! The objects in your manifest are now locked with a new retain until date of May 6, 2021. You can also review your S3 Batch Operations completion reports to confirm that all objects have been locked.

You can also review your S3 Batch Operations completion reports to confirm that all objects have been locked.

If you must ever change the lock on these objects again, you can reuse the same manifest in the future.

Cleaning up

If you no longer need updated inventory reports, make sure you delete your daily inventory report configuration to prevent it from creating new reports.

Additional information

S3 Object Lock assessments: S3 Object Lock has been assessed for SEC Rule 17a-4(f), FINRA Rule 4511, and CFTC Regulation 1.31 by Cohasset Associates. Cohasset Associates is a management consulting firm specializing in records management and information governance. A copy of the Cohasset Associates assessment report can be downloaded from the S3 Object Lock technical documentation. You can then provide the assessment report to your regulator when you notify them of your decision to use Amazon S3 for your regulated data.

CSV Object lists: If you must process a subset of the objects in a bucket and cannot use a common prefix to identify them, you can create a CSV file and use it to drive your job. You could start from an inventory report and filter the objects based on name or by checking them against a database or other reference.

Policy for S3 Batch Operations support for S3 Object Lock: This is the policy used to extend a retention period with S3 Batch Operations support for S3 Object Lock in the preceding example. For more details, see granting permissions for Amazon S3 Batch Operations.

  • Allow S3 Batch Operations to assume the role being created
  • Allow the role to check the S3 Object Lock configuration on the bucket that contains the job’s storage objects
  • Allow the role to override current governance retention period
  • Allow the role to put object retention on the objects
  • Allow the role to read manifest objects
  • Allow the role to write job completion report objects

Trust Policy

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "Service":"batchoperations.s3.amazonaws.com"
         },
         "Action":"sts:AssumeRole"
      }
   ]
}

Permission Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetBucketObjectLockConfiguration",
            "Resource": [
                "arn:aws:s3:::{{TargetResource}}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObjectRetention",
                "s3:BypassGovernanceRetention"
            ],
            "Resource": [
                "arn:aws:s3:::{{TargetResource}}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::{{ManifestBucket}}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::{{ReportBucket}}/*"
            ]
        }
    ]
}

Pricing

If you are curious about the cost of running an S3 Batch Operations job, here’s an estimate based on the preceding example. To find the latest S3 pricing information, visit the management and replication section in Amazon S3 pricing. In the US East Region, this is the price I’d pay to lock 1 million standard storage objects. This price includes S3 Batch Operations fees and the cost of the underlying S3 Object Lock requests:

  • $0.25 for the S3 Batch Operations job -> $0.25
  • $1.00 per million object operations performed -> $1.00
  • $0.005 per 1,000 PUT requests on standard storage -> $5.00
  • Total -> $6.25

There are also request fees for creating, checking the status of, listing, and confirming your job details.

Conclusion

In this post, I have showed you how to manage the governance retention period for a large list of objects. You can also use S3 Batch Operations support for S3 Object Lock to apply compliance retention periods, to apply or remove governance retention periods, and to apply or remove legal holds.

If you have any feedback or questions, please leave a comment in the comments section. Here are some resources to help you learn more about S3 Batch Operations:

S3 Object Lock documentation: Read about S3 Object Lock overview, and managing Amazon S3 Object Lock.

S3 Batch Operations documentation: Read about creating a job, Batch Operations, and managing Batch Operations jobs.

Tutorial videos: Check out the S3 Batch Operations video tutorials to learn how to create a job, manage and track a job, and to grant permissions.