AWS Storage Blog

Updating Amazon S3 object ACLs at scale with S3 Batch Operations

Update (4/27/2023): Amazon S3 now automatically enables S3 Block Public Access and disables S3 access control lists (ACLs) for all new S3 buckets in all AWS Regions.


Access control lists (ACLs) are permission sets associated with data or other system resources that dictate access permissions, and they have been a staple of data security for decades. You may come across a situation where you want to update the ACL on a large number of files, perhaps billions or more. For example, you might have a folder that has been in use for many years holding files for your application. These files could have different ACLs and even different file owners, and you might need to grant a new user access without wanting to impact existing users. However, updating the ACL on billions of individual objects can be time consuming, costly, and prone to errors that may increase data security risks or negatively affect compliance.

Ever since Amazon Simple Storage Service (Amazon S3) was launched in 2006, you could use access control lists (ACLs) to grant read and write permissions to buckets and objects. Since then, Amazon S3 has grown significantly in every way, from the size and scope of workloads that are backed by it to the control and management features we’ve created to help you manage your objects at an ever-growing scale. Today, you can manage permissions by configuring S3 bucket policies, S3 access point policies, IAM policies, and S3 Block Public Access to control user access in addition to ACLs. Considering all of these options, S3 object ACLs remain incredibly powerful for granting access to an individual object.

In this blog, after a brief overview of managing Amazon S3 data permissions, we consider the case where you may need to update object ACLs across billions of objects. We’ll cover using the AWS Management Console, AWS Command Line Interface (CLI), AWS SDK, and S3 Batch Operations to accomplish this and gather insight into how each will perform while operating at scale across billions of objects so you can determine which method is best for your use case.

Controlling access to your objects in Amazon S3

Before we dive into the different ways to change and manage permissions within Amazon S3, let’s briefly review the different ways you can control permissions to objects in Amazon S3.

Access control lists (ACLs) grant users and potentially other AWS accounts read/write access to your buckets and objects. By default, when another AWS account uploads an object to your bucket, that account owns the object and can grant other users access to it through ACLs. Many modern use cases no longer require the use of ACLs, but they remain useful when you require control of access to each object individually.

Bucket policies are resource-based policies that give permissions to your bucket and the objects in it. You can use bucket policies to add or deny permissions based on elements in the policy like S3 actions, resources, requester, and other conditions in the request. When using the S3 Object Ownership feature’s Bucket owner enforced setting, ACLs are disabled, and data access is controlled by policies.

IAM policies are identity-based policies that give IAM identities permissions to Amazon S3 and other AWS services. Within a single AWS account, all applicable policies are evaluated to make an access decision. For cross-account access, a combination of identity-based policies in the trusted account and resource-based policies (such as bucket policies) in the trusting account are evaluated to make an access decision.

At a higher level, you can use S3 Block Public Access to prevent all access from the internet, or S3 Access Point policies to control access to the access point. S3 Object Ownership also introduced two capabilities to standardize object ownership across a bucket. Bucket owner preferred grants the bucket owner full ownership of objects uploaded to that bucket, and Bucket owner enforced disables ACLs and changes object ownership automatically for all objects in a bucket. Before this feature, the AWS account that uploaded an object would own it. When building new applications, you should consider modifying your S3 Object Ownership settings. However, this is not always possible and you may be dependent on controlling access with object ACLs.

Considering all of these options, S3 object ACLs remain a powerful tool for granting access to an individual object.

Scenario: Update Amazon S3 ACLs to grant CloudFront’s OAI canonical ID access

Let’s consider a scenario where we have a bucket with billions of objects in it, and a large number of them need their ACL permissions updated. This bucket has been around for many years and is used extensively across our organization to serve content to end users via the web. As a result of this, we know that many objects are owned by other accounts in our organization, and some objects have specific ACLs set for their use case. We now want to start taking advantage of Amazon CloudFront to serve these objects quicker to our growing global audience. For this, we need to grant CloudFront’s origin access identity (OAI) access to the bucket objects. When granting this access, we must be sure to keep the existing “legacy” permissions in place to prevent inadvertently changing access for our applications and users. To accomplish this on our legacy bucket with billions of objects, we must update each object’s ACL to grant CloudFront’s OAI user read access.

Options to update ACLs

Next, we’ll look at how you can update the ACL on objects with the AWS Management Console, AWS CLI, and AWS SDK. Then, we’ll introduce S3 Batch Operations to help you do this at scale across millions, billions, or more objects efficiently.

AWS Management Console

You can use the AWS Management Console to update object ACLs individually. This option to update object ACLs will allow you to proceed with caution and test your configuration for the following automation options, but may present challenges if you need to update a large number of objects.

First, sign in to the AWS Management Console and navigate to Amazon S3. In the Buckets menu, select the bucket with the object ACLs you would like to modify.

In the Objects tab, select an object to update. Select the Permissions tab to view the current ACL for the object.

Figure 1 - S3 object permissions tab

Figure 1: Amazon S3 object permissions tab

Select Edit to modify the existing ACL. Editing an ACL from the AWS Management Console updates the existing ACL and does not overwrite it. You can add, remove, or change permissions for grantees.

Figure 2 - Adding and removing permissions to an Amazon S3 object ACL

Figure 2: Adding and removing permissions to an Amazon S3 object ACL

Select Add grantee to add permissions for another AWS account. Enter the canonical ID for the grantee and select the permissions to grant this account. For our CloudFront OAI scenario, we would use the canonical ID of the OAI here and grant Read permissions to Objects.

Figure 3 - Adding a grantee to an S3 Object ACL.

Figure 3: Adding a grantee to an Amazon S3 object ACL

Select Save changes to update the ACL. This is a simple option to test updating the ACL for a limited number of objects, but it is not feasible to scale up to millions of objects using the console.

AWS Command Line Interface (CLI)

You can use the AWS CLI’s get-object-acl command to get an object’s current ACL. You can then take this output to build a new ACL and use the put-object-acl command to update the object. The put-object-acl command will replace the existing permissions, so be careful and test before executing this command iteratively on your objects. Let’s go through an example.

First, we output the existing ACL, and we can verify that myUser has full access to the file.txt object.

aws s3api get-object-acl --bucket "my-bucket" --key "file.txt" 
{
    "Owner": {
        "DisplayName": "myUser",
        "ID": "05be************************************************"
    },
    "Grants": [
        {
            "Grantee": {
                "DisplayName": "myUser",
                "ID": "05be************************************************",
                "Type": "CanonicalUser"
            },
            "Permission": "FULL_CONTROL"
        }
    ]
}

Next, we take this output and add the CloudFront OAI as a Grantee to our list with READ permissions:

aws s3api put-object-acl --bucket "my-bucket" --key "file.txt" --access-control-policy '{
  "Owner": {
    "DisplayName": "myUser",
    "ID": "05be************************************************"
  },
  "Grants": [
        {
            "Grantee": {
                "DisplayName": "myUser",
                "ID": "05be************************************************",
                "Type": "CanonicalUser"
            },
            "Permission": "FULL_CONTROL"
        },
        {
            "Grantee": {
                "DisplayName": "CloudFront Origin Access Identity E3GQTXXXXXXXXXX",
                "ID": "d07d5c3************************************************",
                "Type": "CanonicalUser"
            },
            "Permission": "READ"
        }
    ]
}'

In our simple tests, the processing times ranged from 1.15–1.30 seconds to complete for each object. This is a simple and quick option to run on a smaller set of objects, but since this is single-threaded and takes 1+ seconds per object, you may be looking for ways to increase performance. When increasing the number of objects that need to be updated, you will want to script this update to run in parallel threads across several different prefixes in your bucket.

AWS SDK

AWS SDK makes it easy to interact with different AWS services using well-known programming languages. For this scenario, we will look at updating the existing object ACL with a new grantee (permission) using Java and Python SDK.

// Get the existing object ACL that needs to be modified.
AccessControlList acl = s3Client.getObjectAcl(bucketName, s3Key);

// Grant the necessary permissions.
acl.grantPermission(new CanonicalGrantee(canonicalID), permission);

// Save the modified ACL back to the object.
s3Client.setObjectAcl(bucketName, s3Key, acl);
# The permission that needs to be added to the object
grant_level_entry = {
    "Grantee": {
        "ID": canonical_id,
        "Type": "CanonicalUser"
    },
    "Permission": grant_level
}

# Append the new permission to the list of the existing permissions
object_acl.grants.append(grant_level_entry)

# Put the entire list of permissions back to object ACL
bucket_acl_response = object_acl.put(
    AccessControlPolicy={
    "Grants": object_acl.grants,
    "Owner": {
        "DisplayName": obj_owner_name,
        "ID": obj_owner_id
    }
})

All three approaches (console, CLI, SDK) can be used to update the ACLs on an object. Considering the use case of updating billions of objects, you need the existing list of objects to loop through to update the ACL. S3 Batch Operations comes in handy in such scenarios to loop through and perform batch updates across many objects at scale.

Amazon S3 Batch Operations

With S3 Batch Operations, you can perform actions across billions of S3 objects at scale with a single API call or a few clicks in the console. Current functionality allows you to copy objects, invoke an AWS Lambda function, replace all object tags, delete all object tags, replace access control lists, restore archived objects, set object lock retention, set an object lock legal hold, and replicate objects.

While “replace access control list” looks like an option that you can use quickly and easily, that option replaces the entire ACL instead of updating the existing one. Building on our sample scenario, since we want to add to existing ACL permissions and not replace them, we will choose to invoke a Lambda function that uses the SDK to perform this update. The Lambda function will be invoked for every object in the prefix that we specify with S3 Batch Operations, and we’ll use the SetObjectAcl Java method or the object_acl.put Python method in the AWS SDK to add to the existing ACL, keeping existing permissions intact.

We have provided a sample Lambda function in both Java and Python to get you started using Amazon S3 Batch Operations. You can find the functions as well as instructions to install the Python Lambda function with the AWS CDK here and the Java Lambda function here.

To make it easier to deploy, we’ve created a sample Python AWS Lambda function that can be deployed with the AWS Cloud Development Kit (AWS CDK) and a sample Java Lambda function that can be deployed with AWS Serverless Application Model (AWS SAM). Both of these deployment packages create a Lambda execution role that allows s3:GetObject, s3:GetObjectAcl, and s3:PutObjectAcl. You should further scope down the resource of this role to your S3 bucket.

For demonstration purposes, we’ll do the next steps in the console, and we will assume you used one of the two provided sample Lambda functions.

First, sign in to the AWS Management Console and navigate to Amazon S3.

Next, select Batch Operations from the left navigation menu and select the button to Create Job. This will start a wizard to configure your S3 Batch Operations job. Select the AWS Region where you want to create your job; this should match where your bucket and inventory manifest file reside.

Then populate the Manifest object field with your manifest (or CSV) file listing all of the objects that this job will process. The manifest file can either be from an S3 Inventory report (manifest.json) or a CSV that you created. We will choose an S3 inventory report in this example.

Select Next.

Fig.4- Step one of the S3 Batch Operations wizard where you specify the AWS Region and either a manifest file or CSV that lists the files to be processed

Figure 4. Step one of the S3 Batch Operations wizard where you specify the AWS Region and either a manifest file or CSV that lists the files to be processed

For your Operation type, select Invoke AWS Lambda function, then select your function from the Lambda Function drop-down. Alternatively, you can input the Java or Python function ARN that you created earlier. Select Next.

Figure 5 -Select Invoke AWS Lambda function and then input your Lambda function ARN

Figure 5: Select Invoke AWS Lambda function and then input your Lambda function ARN

On the next page, you have the option to update the Additional options, such as the description and priority for this job. It is recommended that you complete the next section to configure a Completion report so you can review success and failures from the processing job.

You now need to choose an IAM role to perform this processing. This role should have access to perform the necessary actions, and a sample IAM role policy template can be generated within the console by clicking the arrow next to View IAM role policy template and IAM trust policy.

If using the sample policy, be sure to double-check that it has appropriate s3:GetObject and s3:GetObjectVersion permissions to the manifest file and s3:PutObject permissions on the prefix for your reports. Refer to granting permissions for Amazon S3 Batch Operations for configuring permissions specific for your job. Select Next.

Figure 6 - Configure your S3 Batch Operations completion report and set an appropriate IAM role

Figure 6: Configure your S3 Batch Operations completion report and set an appropriate IAM role

Review your job summary to confirm the details. If everything looks correct, click the Create Job button at the bottom.

At this point, your job will show in the Preparing state while it analyzes your manifest file. The time this takes will depend on the size of your manifest file. Once the preparation step completes, the job will change to Awaiting your confirmation to run as pictured in Figure 7. From this state, you will be able to view how many total objects will be acted on. Our example in Figure 7 demonstrates that we will be running this process across 85,531,732 objects.

Figure 7 - S3 Batch Operations has completed its preparation and awaits confirmation to run

Figure 7: S3 Batch Operations has completed its preparation and awaits confirmation to run

If everything is correct, select your job and choose the Run Job button. Review the details on the following confirmation page, and select the Run Job button at the bottom to begin processing.

While the job processes, the status will change to Active, and you will be able to watch the % Complete and the Total failed (rate) by refreshing the status page.

The job Status will change to Completed once all files have been processed, or Failed if it exceeded the job failure threshold. As seen in Figure 8, we selected the completed job and can confirm that we processed over 85 million objects, and the whole process took less than 10 hours.

Figure 8 - Completed S3 Batch Operations job

Figure 8: Completed S3 Batch Operations job

In the example, we were able to update the ACL on over 85 million objects in under 10 hours. This equates to processing over 2,388 objects every second.

While the Lambda function created for this example won’t incur charges unless it’s executed, you should now delete it, along with the IAM policies and roles, to maintain good account hygiene and avoid incurring potential future costs. If you used the cdk or AWS SAM examples, cleanup instructions are included at the bottom of the Java readme and the Python readme.

Conclusion

In this blog, we started with a brief recap on S3 permissions and then explored three different options for you to update your S3 ACLs. These options showed you how to update ACLs in the console, with the CLI, and with the SDK. We also addressed how you can accomplish this at scale with S3 Batch Operations, where we showed an example that paired S3 Batch Operations with a Lambda function to process 85 million objects in under 10 hours. The example given granted CloudFront’s OAI user read access to our objects, but this could be easily extended to lots of use cases where you need to update ACLs across a large number of objects quickly.

Using the methods described in this blog, you can benefit from the scale of S3 and the power of S3 Batch Operations to process updates across your objects. If you are relying on S3 object ACLs for access and need to make a change, you can now do this reliably, quickly, and efficiently, saving you valuable time when needing to perform such tasks.

Thank you for reading this blog on updating Amazon S3 object ACLs at scale. We always love hearing from customers, so let us know how you’ve used this and any feedback you have in the comments section.

Joe Chapman

Joe Chapman

Joe is a Sr. Solutions Architect with Amazon Web Services. He primarily serves AWS EdTech customers, providing architectural guidance and best practice recommendations for new and existing workloads. Outside of work, he enjoys going on new adventures while traveling the world.

Anil Kodali

Anil Kodali

Anil is a Solutions Architect with Amazon Web Services. He works with AWS EdTech customers guiding them with architectural best practices for migrating existing workloads to cloud and design new workloads with cloud first approach. Prior to joining AWS, he worked with large retailers to help them with their cloud migrations.

Noah Leuthaeuser

Noah Leuthaeuser

Noah is an Associate Solutions Architect at AWS with an interest in data management, organizational design, and containers. Noah advises customers in the public sector, providing architectural best practices and recommendations. Outside of work, he enjoys skiing, traveling, and playing guitar.