Move data in and out of AWS GovCloud (US) with Amazon S3

Update (June 3, 2024): AWS DataSync now supports transferring data between Amazon S3 buckets in the AWS GovCloud (US) and commercial Regions. You can use AWS DataSync for moving data between AWS GovCloud (US) and commercial Regions if you are looking for an AWS managed service for your data transfer workflows. The solution presented in this post using Amazon S3 Data Transfer Hub works best if you require a self-managed approach. For more information on using DataSync to move data between Amazon S3 buckets in the AWS GovCloud (US) and commercial Regions, visit the following blog post and public documentation.

Amazon Web Services (AWS) customers who need to comply with the most stringent US government security and compliance requirements operate their workloads in AWS GovCloud (US). AWS GovCloud (US) gives government customers and their partners the flexibility to architect secure cloud solutions that comply with compliance regimes such as FedRAMP High baseline; the Department of Justice’s (DOJ’s) Criminal Justice Information Systems (CJIS) Security Policy; U.S. International Traffic in Arms Regulations (ITAR); Export Administration Regulations (EAR); Department of Defense (DoD) Cloud Computing Security Requirements Guide (SRG) for Impact Levels 2, 4 and 5, and more. To meet those stringent compliance needs, AWS GovCloud (US) is architected as a separate partition from the AWS Standard Regions, providing network and identity isolation.

Increasingly, customers are operating workloads both in AWS GovCloud (US) and standard AWS Regions, such as US East (Northern Virginia). Dependencies between workloads, changing data controls, or enrichment of data across multiple data levels are examples of business needs that may require moving data in and out of AWS GovCloud (US). When operating in both partitions, customers can face challenges when attempting to move their data between the two. Enabling customers to move that data reduces operational burden and decreases the time needed to deliver business value. With freedom of movement, customers can store their data in each partition appropriately and only move what’s needed on demand.

It is important to note that security of customer data in the cloud is a customer specific responsibility in the Shared Responsibility Model. Customers must be mindful to store and move data according to the appropriate conditions and controls of the compliance frameworks with which they must comply.

In this blog post, the first in a two-part series, I explain how to move data between Amazon Simple Storage Service (Amazon S3) buckets in the AWS GovCloud (US) and standard partitions. Then, check out part two for how to use AWS DataSync to move data on Network File System (NFS) shares between the standard and AWS GovCloud (US) partitions.

Solution overview

AWS Labs published a complete open source solution for Amazon S3 data replication called the Data Transfer Hub (Github), which is a secure, reliable, scalable, and trackable solution that offers a unified user experience to create and manage different types of data transfer tasks from different sources to AWS Cloud-native services. The Data Transfer Hub is a comprehensive solution that includes a feature set beyond Amazon S3 data transfer. For the purposes of this walkthrough, you are only going to utilize the Amazon S3 Plugin for the Data Transfer Hub, which is maintained as a separate open source solution (Github). I recommend exploring the full Data Transfer Hub solution outside of the scope of this blog post.

The Amazon S3 Plugin deploys the architecture shown in Figure 1:

Figure 1. Amazon S3 Data Transfer Hub solution architecture.

In this architecture, a recurring Finder job runs as an AWS Fargate task which compares the objects in the source and destination buckets. If there are identified differences between the two, it creates a message published on an Amazon Simple Queue Service (Amazon SQS) queue, for each object that needs to be transferred. A Worker job running on an Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling group consumes the message, performs the transfer, and logs results in an Amazon DynamoDB table. Failure handling is built into the solution utilizing Amazon SQS dead-letter queues (DLQ) and Amazon Simple Notification Service (Amazon SNS) for alerting.

Prerequisites

This walkthrough deploys the Data Transfer Hub plugin to an AWS Standard account and treats that as the destination location. The AWS GovCloud (US) account is our source location.

To complete this walkthrough, you need:

An AWS standard account
An AWS GovCloud (US) account
A source/destination Amazon S3 bucket in each account respectively
A least privileged Identity and Access Management (IAM) user that has a permission policy allowing access to the Amazon S3 bucket in the AWS GovCloud (US) account

Note that for this walkthrough, I am using the following Regions:

Standard account: us-west-2
AWS GovCloud (US) account: us-gov-west-1

If you do not have access to an AWS GovCloud (US) account, this walkthrough can be completed using two AWS standard accounts and two Regions.

Procedure

Step 1: Create an Amazon Elastic Container Service cluster

An Amazon Elastic Container Service (Amazon ECS) cluster is required to run the AWS Fargate task that identifies the differences between the source and destination buckets. For this walkthrough, I create a network-only type of cluster and create a new Amazon Virtual Private Cloud (Amazon VPC).

In your standard AWS account, navigate to the Amazon ECS console from the AWS Management Console. From the Amazon ECS cluster home page, choose Create Cluster.
Select Networking only cluster template.
Configure cluster. Specify s3-data-replication-hub-cluster as the cluster name and check the box for Create VPC leaving the default options. Select Create.

Cluster creation takes approximately 2-3 minutes. Take note of the vpc-id and subnet-ids as you need them later.

Step 2: Configure credentials

You must provide the transfer hub IAM credentials to access the Amazon S3 bucket in the source region, AWS GovCloud (US). You store the Access Key and Secret Access Key in AWS Secrets Manager, which encrypts your credentials with AWS Key Management Service (AWS KMS). As part of security best practices, please provide a limited-scoped, least privilege access policy for this user. The credentials are stored in AWS Secrets Manager as a JSON object in the following format:

{ "access_key_id": "<Your Access Key ID>", "secret_access_key": "<Your Access Key Secret>" }

In the standard account, navigate to AWS Secrets Manager from the AWS Management Console. From the Secrets Manager home page, select Store a new secret.
For secret type, select Other type of secrets. Copy and paste above JSON text into the Plaintext section, and change the value to your Access Key / Secret Access Key accordingly.
Select Next and specify the Secret name as “s3-data-replication-hub-secret.” Select Next, leave the default configuration, select Next, then select Store in the last step.
Note that if the Access Key/Secret Access Key is for a source bucket, READ access to the bucket is required. If it’s for a destination bucket, READ and WRITE access to bucket is required. Please refer to the Set up Credential page in the GitHub repository for more details.

Update: Since this post was originally published AWS has released AWS IAM Roles Anywhere. IAM Roles Anywhere allows customer to obtain temporary security credentials in IAM for workloads that run outside of AWS. Using IAM Roles Anywhere means you don’t need to manage long-term credentials for workloads running outside of AWS. IAM Roles Anywhere is our recommended way to obtain credentials cross partition, like in the AWS GovCloud (US) to AWS Standard Regions use case discussed in this post. We are working with the Amazon S3 Data Transfer Hub solution maintainers to release a future update that utilizes IAM Roles Anywhere.

Step 3: Deploy the Replication Hub Plugin using AWS CloudFormation

The plugin is deployed as a CloudFormation template. In the standard account, navigate to AWS CloudFormation in the AWS Management Console. Select Create stack, With new Resources. Select Template is ready and supply the Amazon S3 URL below; then select Next.

https://aws-gcr-solutions.s3.amazonaws.com/data-transfer-hub-s3/latest/DataTransferS3Stack-ec2.template

Or you may alternatively use this Launch Stack link.

Figure 2. Creating the Data Transfer Hub stack.

Now it is time to specify stack parameters with the AWS GovCloud (US) Amazon S3 bucket as the source. Feel free to name the stack as appropriate for your environment. For this walkthrough, I name the stack ‘DTHS3Stack’. A table of the parameters you should enter is in Table 1.

Parameter	Value	Notes
Source Type	Amazon_S3
Source Bucket	<Your Bucket Name>	Only name needed, not any additional prefixes
Source Prefix	<Leave Blank>
Source Region	us-gov-west-1	Update accordingly if you used an alternate Region
Source Endpoint URL	<Leave Blank>
Source In Current Account	“false”
Source Credentials	s3-data-replication-hub-secret
Enable S3 Event	No
Destination Bucket	<Your Bucket Name>
Destination Prefix	<Leave Blank>
Destination Region	us-west-2	Update accordingly if you used an alternate Region
Destination In Current Account	“true”	Note that the stack creates the appropriate IAM role
Destination Credentials	<Leave Blank>
Destination Storage Class	STANDARD
Destination Access Control List	bucket-owner-full-control
Alarm Email	<your email>	You are subscribed to an SNS topic.
ECS Cluster Name	s3-data-replication-hub-cluster

Table 1. CloudFormation stack parameters.

Select Next, leave the Default Options, then select Next again.
Review the details on the next page, then select the two boxes within the Capabilities box and select Create Stack.

The deployment takes approximately 3-5 minutes.

Step 4: Validate file transfers

The file transfers begin immediately after the stack has fully deployed and is set to run at 60 minute intervals via an Amazon EventBridge rule that is prefixed with your stack name. This rule can be updated after stack deployment to run at the needed frequency for your use case. Let’s take a look at our destination bucket in the Amazon S3 console.

Figure 3 shows four test files transferred successfully in the destination bucket:

Figure 3. Successfully transferred files in Amazon S3 bucket.

The solution comes with observability built in. The deployment stack provisioned two Amazon CloudWatch log groups as well as a CloudWatch dashboard. Additionally, the provisioned DynamoDB table (prefixed with your stack name) provides details around the individual files transferred such as start/end time and file size. The dashboard is named <stackname>-Dashboard-<region> and shows you a quick view of job queues, transfer successes and failures, network usage, CPU/Disk/Memory utilization. Figure 4 shows an example of this dashboard.

Figure 4. An example CloudWatch dashboard for this walkthrough.

The log groups gives you additional details of the transfers:

<StackName>-ECSStackFinderLogGroup<random suffix>: The log group for the Amazon ECS task that identifies which files need to be transferred.
<StackName>-EC2WorkerStackS3RepWorkerLogGroup<random suffix>: The log group for the Amazon EC2 worker instances that perform the actual file transfers.

If the transfers did not occur as expected, check out the troubleshooting section of the Github repository.

Cleanup

To clean up this solution, delete the following resources:

Data Hub Transfer Stack in CloudFormation
- You have to manually delete the two CloudWatch log groups as they contain data and are skipped during CloudFormation stack deletion
Amazon ECS cluster from the Amazon ECS console
Secret in AWS Secrets Manager

If you plan on moving right onto Part 2 of the blog series, do not clean up the resources, as you use them again in that walkthrough.

Conclusion

In this post, you learned to use the Amazon S3 Data Transfer Hub Solution from AWS Labs to transfer data in Amazon S3 buckets between the AWS standard and AWS GovCloud (US) partitions. You transferred directionally from AWS GovCloud (US) to standard partitions. This technique can be applied in reverse as well. This capability simplifies the process of sharing data across these isolated partitions and helped gain a new degree of freedom of movement for our data.

I hope you can utilize this approach to more simply move data across partitions, easing your operational burden and more quickly delivering business value. Read part two of this series to learn how to transfer data cross-partition using AWS DataSync; we walk through how you can move data stored in your networked file system to Amazon S3.

AWS Public Sector Blog