Transferring file data across AWS Regions and accounts using AWS DataSync
Many customers that use Amazon EFS want to replicate their file systems to other AWS accounts or other AWS Regions. Customers may do this for business continuity purposes or to provide low-latency access to data for processes running in other Regions. AWS DataSync makes it easy for customers to replicate data from one Amazon EFS file system to another without traversing a public network.
In this post, I cover configuring AWS DataSync for cross-Region or cross-account transfers using VPC endpoints. Following these steps can help you move data between Amazon EFS file systems using only private IP addresses accessible within your VPCs.
AWS DataSync for data migrations
AWS DataSync is an online transfer service that simplifies moving, copying, and synchronizing large amounts of data between on-premises storage systems and AWS Storage services. DataSync accelerates and automates data transfer, removing the need to modify applications, develop scripts, or manage infrastructure.
You can deploy AWS DataSync within your Amazon VPC using VPC endpoints. With this feature, data is transferred between your agent, and the DataSync service endpoints without a public IP address or public internet access. Using AWS DataSync with VPC endpoints ensures that all traffic is under your control within your VPC at all times.
Many of our customers use DataSync with VPC endpoints to replicate an Amazon EFS file system from one AWS Region to another for one-time migrations. Others use DataSync for periodic data ingestion for distributed workloads or to automate replication for data protection across Regions or accounts.
AWS DataSync uses an agent to transfer data from one Amazon EFS file system to another. The agent should be deployed on an Amazon EC2 instance in the source file system Region or location. Then, the DataSync agent should be activated in the destination file system Region using VPC endpoints, enabling your agent to communicate with the DataSync service endpoints using private IPs. In configuring this setup, you place the DataSync agent in the VPC of the source file system and activate it using the private VPC endpoint where your target file system resides. VPC peering should be configured to facilitate the transfer of data between VPCs across different AWS Regions.
In addition, for each transfer task, four elastic network interfaces (ENIs) automatically get placed in your destination VPC. Your source DataSync agent sends traffic through these ENIs in order to transfer data from the source file system to the destination file system.
The following diagram illustrates the setup in more detail, and specifies the AWS resources mentioned in the coming steps:
Setting up the transfer
Follow this step-by-step guide for configuring AWS DataSync to transfer data from file system to file system across AWS Regions and accounts using private IP addresses accessible only from within your VPC. This setup includes the peering of the VPC Network to facilitate the transfer of data from source Region or account to destination Region or account.
Create a VPC peering connection
Create a VPC peering connection between the VPC of the source and destination file systems. Before you proceed to the next step, use the AWS Management Console and navigate to the VPC console to verify the following:
- View the peering connection. Confirm that the status is Active.
- View the source VPC. Review the VPC’s route table to confirm that there is an Active route to a target that begins with pcx. This route is for the peering connection.
- View the destination VPC. Review the VPC’s route table to confirm that there is an active route to a target that begins with pcx.
In the following screenshot, I set up VPC peering between the source and destination VPC:
Refer to the following details of my network setup for my source and destination locations:
Source EFS: fs-b872e712 Mount Target IP: 192.168.0.254
Agent IP: 192.168.0.237
Destination EFS: fs-fc280bdd Mount Target IP: 10.0.22.216
Source VPC CIDR: 192.168.0.0/24
Destination VPC CIDR: 10.0.0.0/16
Set up DataSync agent in source location
In the source location, you must deploy the EC2 DataSync agent in the private subnet. In this setup, there is no public internet connection at all. Follow these steps to set up your location:
- Choose a private subnet from the source VPC, which is used to establish a peer with the destination VPC.
- Ensure that your DataSync agent is able to access your source Amazon EFS location via NFS, and also able to mount a source NFS file system on port 2049 from the DataSync agent IP. Your agent does not need a public IP.
Configure the destination location
Create a security group in the destination Region or account. Make sure that your agent can route traffic towards the private IP addresses that DataSync uses. These addresses include one VPC endpoint for control traffic and four ENIs to use for data transfer. The security group manages access to these private IPs. Since the agent must establish connections to these IPs, configure inbound rules allowing the agent’s private IP in the source Region (192.168.0.237 in the screenshot) to connect to the IPs DataSync uses. The agent must talk to ports 1024–1064, 443, and port 22.
Note: No outbound rules are required.
In the destination location, create a VPC endpoint for the DataSync service. This documentation breaks down how to do so in the fourth step. Ensure that the private subnet should have enough IP range (at least greater than four) for DataSync execution endpoints, because each DataSync execution consumes four IP addresses for the task execution endpoints.
Create and active the DataSync agent
You’re now ready to activate your agent. To activate the agent from the console, you can deploy a Windows instance in the public subnet of the destination Region, from where you can reach the private IP of the DataSync agent deployed in your source Region.
After the Windows instance is deployed, launch a web browser on the Windows instance and log into the AWS Management Console. From the AWS Console, activate the agent by generating an activation key. This is shown in the following screenshots.
The following screenshot depicts the Create agent page:
Make sure the machine you use for the activation has a network route to the DataSync agent. The activation process requires the agent’s port 80 to be accessible by the browser. After the agent is activated, it closes port 80 and the port is no longer accessible.
Choose Get Key, and optionally enter an agent name and tags, then choose Create agent. Your new agent is now visible in the Agents tab of the DataSync console. The green VPC Endpoint banner indicates that all tasks performed via this agent use private endpoints, without traversing the public internet.
The following screenshot shows a successfully activated DataSync agent:
Configure location for DataSync task
It is now time to create a DataSync task in the destination AWS Region. You must first create a source and destination location. To configure the source location, you must choose the location type as Network File System (NFS). In the source location option, choose NFS type and add the Amazon EFS domain name system (DNS) name or use one of the EFS IP addresses of your mount target with the mount path of your location.
The following screenshot shows the source location configuration in detail:
In this destination location option, choose Amazon EFS file system for location type. Select the file system ID with the mount path information, then click to expand Additional Information to choose the subnet and security group. Make sure to select the subnet where your destination EFS file system resides and select the destination file system’s security group. To facilitate transfer via private IPs, the task creates four elastic network interfaces (ENIs) on your behalf, in the subnet that you chose. Make sure that your agent can reach them, and you allow outbound traffic from the agent to these ENIs via port 443.
The following screenshot shows the destination location configuration in detail:
After the locations are created, create a DataSync task by selecting existing locations that you created in the preceding steps and configure task settings for your file transfer.
When your DataSync Task has been created, it performs a quick validation on the source file system only and changes the status to Available.
The following screenshot shows the DataSync task used in this example:
Once you complete the preceding step, you are good to start the task. The task goes through multiple steps and you can refer to the documentation to understand the status of the different phases of the task.
If you are no longer using the resources discussed in the blog, I suggest that you clean up the AWS resources that you don’t need to avoid incurring unwanted charges. To accomplish this, after finishing the proof of concept, clean up/delete the following resources:
- DataSync agent and task
- DataSync source and destination locations
- Windows instance
- Amazon EFS file systems in your source and destination AWS Regions
In this blog post, I reviewed setting up AWS DataSync to transfer files between Amazon EFS file systems across AWS Regions and AWS accounts. Using DataSync, you can perform one-time data migrations, periodic data ingestion for distributed workloads, and automate replication for data protection and recovery. Using the steps outlined in this post, you can simply transfer Amazon EFS files or file systems, making it easier for you to make your data available where you need it.
To learn more about AWS DataSync, visit the AWS DataSync product page or get started building this architecture in the AWS Management Console. For more tips and best practices when using DataSync, check out our prior post, covering transferring files from on-premises to AWS using VPC endpoints.
Thanks for reading this blog post! If you have any comments or questions, please don’t hesitate to leave them in the comments section.