AWS Storage Blog
Large scale migration of encrypted objects in Amazon S3 using S3 Batch Operations
Many organizations have data governance strategies or compliance requirements that mandate their data be replicated and redundant across different management accounts and global regions. Moving encrypted data at scale can often take a few additional steps due to the need to decrypt and re-encrypt objects as part of the replication process.
Amazon Simple Storage Service (Amazon S3) offers three options for server-side encryption: Amazon S3 managed server-side encryption (SSE-S3), server-side encryption with AWS KMS keys (SSE-KMS), and server-side encryption with customer provided keys (SSE-C). As of a recent update, SSE-S3 is applied automatically to all new objects as the default if you haven’t choosing another encryption method. You can easily perform large-scale Amazon S3 operations using Amazon S3 Batch Operations, including migrating or replicating your encrypted data to different accounts.
In this post, we walk through migrating new and existing S3 objects encrypted with SSE-KMS keys when the source and destination S3 buckets are owned by different AWS accounts in the same AWS Region. We accomplish this with S3 Batch Operations, which lets you perform large-scale batch operations on S3 objects. You can use the solution in this post to minimize latency by maintaining copies of your data in AWS Regions geographically closer to your users, to meet compliance and data sovereignty requirements, and to create additional disaster recovery resiliency.
Solution overview
Amazon S3 Batch Replication, through a Batch Operations job, provides a method for replicating objects that existed before a replication configuration was in place, objects that you have previously replicated, and objects that have failed replication. This solution helps you accomplish cross-account Amazon S3 Batch Replication.
- You will configure an Amazon S3 Replication rule that enables automatic, asynchronous copying of new encrypted S3 objects in your source S3 bucket in AWS account A to a destination S3 bucket in AWS account B.
- You will use Amazon S3 Batch Replication to replicate existing encrypted S3 objects in your source S3 bucket in AWS account A to a destination S3 bucket in AWS account B.
Prerequisites
For this walk-through, you need the following:
- Two AWS accounts in the same region
- S3 bucket (at source) with objects encrypted by SSE-KMS
- S3 bucket (at source) for storing Amazon S3 Batch Operations completion reports
- S3 bucket (at destination) for replicated objects
- AWS Identity and Access Management (IAM) user/role with access to Amazon S3, Amazon S3 Batch Operations, and AWS KMS
- Two SSE-KMS keys in respective AWS accounts
- AWS Command Line Interface (AWS CLI) version 2
Solution walkthrough
The high-level steps, followed by a more in-depth walkthrough:
In Account A
- Create IAM role cross_account_replication and update the role with SSE-KMS key (a) and SSE-KMS key (b) specific permissions.
- Create IAM role s3_batch_operations.
In Account B
- Update S3 bucket policy with permissions for Account A cross_account_replication IAM role.
- Create a KMS key (b) and update the key policy with permissions for Account A cross_account_replication IAM role.
In Account A
Once Steps 1 through 4 are completed, navigate to Account A and
- Create Amazon S3 Replication rule in the source S3 bucket by selecting ‘Replicate objects encrypted with AWS KMS’.
- Create an Amazon S3 Batch Operations job to replicate existing encrypted objects.
In AWS account A
- Creating IAM role cross_account_replication
To replicate existing S3 objects that are encrypted using AWS KMS, we must grant additional permissions to the IAM role that you will specify in the replication configuration.
You can configure your bucket to use an S3 Bucket Key which will decrease the request traffic from Amazon S3 to AWS KMS and reduce the cost of SSE-KMS. We recommend you to enable S3 Bucket Keys in the source and destination buckets before the replication is initiated. The savings are greatest if enabled before the initial PUT operation.
- When an S3 Bucket Key is enabled for the source and destination bucket, the encryption context will be the bucket Amazon Resource Name (ARN) and not the object ARN, for example,
arn:aws:s3:::bucket_ARN
. You must update your IAM policies to use the bucket ARN for the encryption context. The following example shows the encryption context with the S3 bucket ARN.
"kms:EncryptionContext:aws:s3:arn": [
"arn:aws:s3:::bucket-name"
]
- If an S3 Bucket Key is only enabled on the destination bucket and not the source bucket, then you must use the object ARN, for example
arn:aws:s3:::bucket-name/*
. The following example shows the encryption context with the S3 object ARN.
"kms:EncryptionContext:aws:s3:arn": [
"arn:aws:s3:::bucket-name/*"
]
Now, let’s create an IAM policy using the following template and attach it to an IAM role cross_account_replication in the source account. In this particular use-case, S3 Bucket key is not enabled and therefore we are using S3 object ARN for the encryption context. Update everything in red as appropriate to suit your needs.
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"s3:GetReplicationConfiguration",
"s3:ListBucket"
],
"Resource":[
"arn:aws:s3:::source-bucket"
]
},
{
"Effect":"Allow",
"Action":[
"s3:GetObjectVersionForReplication",
"s3:GetObjectVersionAcl"
],
"Resource":[
"arn:aws:s3:::source-bucket/*"
]
},
{
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:ReplicateDelete"
],
"Resource":"arn:aws:s3:::destination-bucket/*"
},
{
"Action":[
"kms:Decrypt"
],
"Effect":"Allow",
"Condition":{
"StringLike":{
"kms:ViaService":"s3.source-bucket-region.amazonaws.com",
"kms:EncryptionContext:aws:s3:arn":[
"arn:aws:s3:::source-bucket-name/*"
]
}
},
"Resource":[
"arn:aws:kms:us-east-1:123456789101:key/id(a)from source account"
]
},
{
"Action":[
"kms:Encrypt"
],
"Effect":"Allow",
"Condition":{
"StringLike":{
"kms:ViaService":"s3.destination-bucket-region.amazonaws.com",
"kms:EncryptionContext:aws:s3:arn":[
"arn:aws:s3:::destination-bucket-name/*"
]
}
},
"Resource":[
"arn:aws:kms:us-east-1:123456789102:key/id(b)from destination account"
]
}
]
}
- Creating IAM role s3_batch_operations
Now, let’s create an IAM policy to replicate existing objects and attach it to an IAM role s3_batch_operations in the source account. Update everything in red as appropriate to suit your needs.
{
"Version":"2012-10-17",
"Statement":[
{
"Action":[
"s3:InitiateReplication"
],
"Effect":"Allow",
"Resource":[
"arn:aws:s3:::replication-source-bucket-name/*"
]
},
{
"Action":[
"s3:GetReplicationConfiguration",
"s3:PutInventoryConfiguration"
],
"Effect":"Allow",
"Resource":[
"arn:aws:s3:::replication-source-bucket-name"
]
},
{
"Effect":"Allow",
"Action":[
"s3:PutObject"
],
"Resource":[
"arn:aws:s3:::completionreport-bucket-name/*"
]
}
]
}
In AWS account B
- Updating S3 bucket policy
Now, add the following bucket policy on the destination S3 bucket for Account A cross_account_replication IAM role. Update everything in red as appropriate to suite your needs.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Permissions on objects",
"Effect": "Allow",
"Principal": {
"AWS": "ARN of cross_account_replication IAM role from source account"
},
"Action": [
"s3:ReplicateDelete",
"s3:ReplicateObject"
],
"Resource": "arn:aws:s3:::destination-bucket-name/*"
},
{
"Sid": "Permissions on bucket",
"Effect": "Allow",
"Principal": {
"AWS": "ARN of cross_account_replication IAM role from source account"
},
"Action": [
"s3:List*",
"s3:GetBucketVersioning",
"s3:PutBucketVersioning"
],
"Resource": "arn:aws:s3:::destination-bucket-name"
}
]
}
-
- Updating SSE-KMS policy
In our use case, the destination bucket is in a different AWS account. Therefore, we must make sure to use an AWS KMS customer managed key, which is owned by the destination account. If not, then the destination account can’t access the objects in the destination bucket. To use an SSE-KMS that belongs to the destination account to encrypt the destination objects, the destination account must update the SSE-KMS policy with permissions for the replication role in Account A. Now, let’s update the SSE-KMS (b) policy with permissions for Account A cross_account_replication IAM role.
{
"Sid": "S3ReplicationSourceRoleToUseTheKey",
"Effect": "Allow",
"Principal": {
"AWS": "ARN of cross_account_replication IAM role from source account"
},
"Action": ["kms:GenerateDataKey", "kms:Encrypt"],
"Resource": "*"
}
To edit the SSE-KMS policy, navigate to the AWS KMS console. From the list of SSE-KMS, choose the alias or key ID of the SSE-KMS that you want to update. Select the Key policy tab and then edit the policy.
In AWS account A
- Creating S3 replication rule
Amazon S3 Replication is a managed, low cost, elastic solution for replicating objects from one S3 bucket to another. You can use Amazon S3 API or the S3 console to create a replication configuration rule.
Sign in to the AWS Management Console and open the Amazon S3 console. From the list of S3 buckets, select the source bucket. Choose the tab Management, scroll down to Replication rules, and then choose Create replication rule.
Next, specify the destination AWS Account ID and S3 Bucket name.
Then, choose the IAM role and the SSE-KMS key used for replication.
Then, select the Destination storage class for the objects.
Finally, the replication rule will look similar to the following screenshot:
Alternatively, you also have the option to use Amazon S3 API put-bucket-replication to create a replication configuration rule as follows.
aws s3api put-bucket-replication \ --bucket source-bucket-name \ --replication-configuration file://replication-sample.json
The following is a sample replication-sample.json
file.
{
"Rules": [
{
"Status": "Enabled",
"Filter": {},
"SourceSelectionCriteria": {
"SseKmsEncryptedObjects": {
"Status": "Enabled"
}
},
"DeleteMarkerReplication": {
"Status": "Disabled"
},
"Destination": {
"EncryptionConfiguration": {
"ReplicaKmsKeyID": "ARN of AWS KMS Key(b)from destination account"
},
"Account": "AWS Account B #",
"Bucket": "arn:aws:s3:::destination-bucket-name",
"AccessControlTranslation": {
"Owner": "Destination"
}
},
"Priority": 0,
"ID": "Cross-Region-Replication-rule"
}
],
"Role": "ARN of cross_account_replication IAM role from source account"
}
- Creating S3 Batch Operations job
You can use Amazon S3 API or Amazon S3 console to create a Batch Operations job.
Using the Amazon S3 console, this is how you create a Batch Operations job. Select your AWS Region, Create manifest using S3 Replication configuration, and provide your Source bucket. Remember to use the IAM role s3_batch_operations created in Step 2 while creating the job.
Then, select the replication Operation type.
Next, configure the additional options required for the Batch Operations job, like the Description, Priority, and Completion report.
Next, specify the IAM role used by the S3 Batch Operations job.
Finally, review your selections and Create job.
Once the Batch Operation job is created, select the job ID and chose Run Job from the Batch Operations console.
Alternatively, you can use Amazon S3 API to create a Batch Operations job as follows. Update everything highlighted to suite your needs.
aws s3control create-job \
--account-id 111122223333 \
--operation '{"S3ReplicateObject":{}}' \
--report '{"Bucket":"arn:aws:s3:::***","Prefix":"batch-replication-report", "Format":"Report_CSV_20180820","Enabled":true,"ReportScope":"AllTasks"}' \
--manifest-generator '{"S3JobManifestGenerator": {"ExpectedBucketOwner": "111122223333", "SourceBucket": "arn:aws:s3:::***", "EnableManifestOutput": false, "Filter": {"EligibleForReplication": true, "ObjectReplicationStatuses": ["NONE","FAILED"]}}}' \
--role-arn arn:aws:iam::111122223333:role/batch-Replication-IAM-policy \
--region source-bucket-region \
--priority 1 \
--no-confirmation-required
Validation
Once the job is started, you can navigate to the S3 Batch Operations page to see the status of the job, the percentage of files that have been replicated, and the total number of files that have failed the replication. S3 Batch Operations generates a report for jobs that have completed, failed, or been canceled. The completion report contains additional information for each task, including the object key name and version, status, error codes, and descriptions of any errors. You could use this to validate the success of your S3 Batch Operations job.
Alternatively, you can use the Amazon S3 API describe-job to validate the status.
aws s3control describe-job \ --account-id 123456789012 \ --job-id 93735294-df46-44d5-8638-6356f335324e
Cleaning up
To avoid ongoing charges in your AWS account, you should delete the AWS resources listed in the prerequisites section of this post. Furthermore, log in to the AWS Management Console and delete any manually created resources.
Conclusion
In this post, we covered using Amazon S3 Batch Operations to migrate new and existing S3 objects encrypted with SSE-KMS keys when the source and destination S3 buckets are owned by different AWS accounts in the same AWS Region.
Companies of any size can use Amazon S3 Batch Operations to perform large-scale replication on S3 objects using automation, saving them time and money. With the solution in this post, you can seamlessly migrate encrypted objects to satisfy latency, compliance, and disaster recovery requirements.
For further reading, refer to AWS Well-Architected Framework, Architecture Best Practices for Storage, and AWS Storage Optimization. We are here to help, and if you need further assistance in developing a successful cloud storage optimization strategy, reach out to AWS Support and your AWS account team.