Migrate to Amazon FSx for Windows File Server using AWS DataSync
Many customers have on-premises file storage infrastructure for their Windows workloads that they want to get out of having to constantly maintain. There is a certain hassle involved with backing up, patching, monitoring, and maintaining the hardware and software that they reasonably want to avoid – it is a cost and time drain. Customers have asked AWS to provide a fully managed, native Windows file server solution that frees them from worrying about underlying infrastructure while still providing native Windows file server features. In response, AWS created Amazon FSx for Windows File Server (Amazon FSx), which provides fully managed SMB-based file storage for Windows applications and workloads. With Amazon FSx, AWS manages the underlying hardware, software, and operations (including servers, storage, patching, availability, durability, encryption, monitoring, backups, and more).
The next question that customers ask is, “How do I migrate my data from my on-premises systems to Amazon FSx for Windows File Server?” In the past, the options were manual methods, like using Robocopy, WinRAR, or copying files. In January 2020, AWS enabled AWS DataSync, a fully managed data migration service to help customers migrate their data from their on-premises systems to Amazon FSx and other storage services. DataSync retains the Windows file properties and permissions, and allows incremental, delta transfers so that the migration can happen over time, copying over only the data that has changed. In addition, DataSync enables high-speed transfer through its use of compression and its parallel transfer mechanism, while also giving customers the ability to control the amount of bandwidth used during transfers.
In this blog, I walk through how to set up AWS DataSync to migrate data from your Windows file system to Amazon FSx for Windows File Server. The process involves four simple steps:
- Step 1: Creating the Amazon FSx for Windows File Server file system, which will be the target storage.
- Step 2: Installing the AWS DataSync agent near your source file system.
- Step 3: Configuring the DataSync task that migrates the data from your source storage to the target storage.
- Step 4: Running the DataSync migration task.
We have also created a quick video that walks through the detailed process as well:
Step 1: Create an Amazon FSx file system
The first step is to create an Amazon FSx for Windows File Server file system, which is the target storage for your files. You can join the file system to your existing Active Directory. By joining the file system to your Active Directory, the file permissions are the same in Amazon FSx as they are on-premises. The result is that your users have the same file access.
For this blog post, I create an Amazon FSx file system that spans multiple AWS Availability Zones (Multi-AZ) with 32 GiB of capacity and 16 MB/s of throughput capacity. When you create the file system, Amazon FSx allows you to specify:
- How large a file system to create (32 GiB to 65,535 GiB)
- The type of storage (SSD or HDD)
- How much throughput performance to allocate
- To help you set throughput performance, the service provides a recommended throughput capacity based upon the storage capacity that you set. However, you still have the option to customize it for your application’s specific needs.
- The deployment mode (Single-AZ or Multi-AZ)
- We recommend that you deploy a Multi-AZ architecture that instructs Amazon FSx to do block level replication of your data from one AWS Availability Zone to a second Availability Zone.
- The Window authentication mode (AWS Managed Microsoft Active Directory or Self-managed Microsoft Active Directory)
- If you have an existing Active Directory, then you would select the Self-managed option.
- The following screenshot shows all of the specifications already selected for the purposes of this example.
Step 2: Install the AWS DataSync agent
The next step is to install the DataSync agent. AWS recommends installing the DataSync agent close in network connectivity to the source file system. AWS provides an image that you can deploy to install this agent onto your on-premises environment. To install the DataSync agent, log in to the AWS Management Console and navigate to the DataSync service. Click the Get started button.
Currently, as shown in the following screenshot, AWS provides two options for deploying the agent, either a VMware image or an EC2 image. For migrations from an on-premises environment, download the VMware image.
You can select how you want to communicate from the DataSync agent to the DataSync service by selecting the service endpoint. The two most common options are to communicate over the internet using the public service endpoints or over a private connection using the VPC endpoints using AWS PrivateLink.
Once you deploy the agent, you must register the agent with the DataSync service by entering the name or IP address of the agent. Then press the Get key button.
If the machine where you have opened the DataSync console cannot communicate over the network with the DataSync agent, you may see a communication error screen. The simple fix is to open the DataSync console on a host that can communicate over the network with the DataSync agent.
Once the DataSync agent registers successfully, you should see it listed in the console:
For more information on installing the DataSync agent, please see this documentation.
Step 3: Create the DataSync data migration task
The next step is to create a task that sets the source location, destination location, and migration settings. It is important to create the DataSync migration task in the same AWS Region as the target storage location.
Open the AWS Management Console, and from the Region drop-down menu in the top right of the console, select the AWS Region where the Amazon FSx file system is created. Then navigate to the DataSync console and select the Create task button.
Step 3.1: Specify the source location
On the Configuration screen, you will specify the source location options. Since you are migrating data from SMB file storage, select the Server Message Block (SMB) option, then specify your DataSync agent. Specify the IP address of the Windows file server, and the Windows file share on that server, that you want to use as your source location. In my lab, my source Windows file server was at 10.0.22.151 and the source file share is called share.
Next, specify the credentials of a user that has rights to read the data from the source location. A typical implementation is to create a service account that is a member of the backup operators group and specify that service account. The following screenshot shows all of the specifications already selected for the purposes of this example.
Step 3.2: Specify the destination location
For this step, specify the destination location where the data should be migrated. Under Configure destination location for the location type, specify the Amazon FSx for Windows File Server option. Select the Amazon FSx file system that you created earlier and the share name where you want to copy the data.
You must also specify an account that has rights to write data to the Amazon FSx file system. To ensure sufficient permissions to files, folders, and file metadata, we recommend that you pick a user that is a member of the Amazon FSx file system’s delegated administrators group. For more information, see here.
Step 3.3: Specify the DataSync task settings
On the next screen, you set the DataSync task settings, as shown in the following screenshot. Here are a couple suggestions on settings these items:
- If you plan to run the DataSync task multiple times, and your business requirements do not require verification of all data in the destination every time the replication task is run, choose Verify only the data transferred. This change instructs AWS DataSync task to verify only the data transferred. For more information on each option, see here.
- Set a bandwidth limit if you want to control the amount of bandwidth that AWS DataSync uses to replicate the data. By default, AWS DataSync scales to use the available bandwidth to expedite the data transfer.
- On the Schedule option, shown in the following screenshot, you can specify when you want the AWS DataSync task to run. If you are concerned about impacting your internet bandwidth during business hours, you can configure the AWS DataSync task to run on a Custom schedule and specify off-peak hours. For more information, see here.
Step 4: Run the DataSync migration task
Once you have specified the task setting and created the DataSync task, it then runs on the schedule that you specified. If you want to start the task immediately, you can do so by selecting the DataSync task and under Actions, select Start.
Now to check that the DataSync migration task is operating, logon to a Windows Server that has access to the Amazon FSx file system. Map a file share to the Amazon FSx file system. If you need help with mapping a file share to the Amazon FSx file system, we have documented the process here.
You should see your files appearing on the Amazon FSx file system as AWS DataSync task copies the files. In the following screenshot, the file explorer on the left is my source file system and the one on the right contains the files on the Amazon FSx file system.
You have set up AWS DataSync to copy files from a Windows file system to an Amazon FSx for Windows File Server. During task execution, AWS DataSync examines the source files and only copies the files that have changed. If you would like to watch a walkthrough of the process, please review this video, also embedded in the introduction of this blog post (you can skip to 3:37 in the video).
If you are no longer using the resources discussed in the blog, I suggest that you clean up the AWS resources that are not needed to avoid incurring unwanted charges. After finishing the proof of concept, you can clean up the resources by deleting the DataSync objects (as in the DataSync agent, DataSync task, DataSync source location, and destination location). Then you must delete the Amazon FSx file system created in Step 1. By doing so, you remove further costs from the resources used in this proof of concept.
In this blog, I walked through the AWS DataSync and Amazon FSx for Windows File Server consoles to configure this migration. AWS provides an extensive API interface as well. Programmatically, you can migrate your file share configuration to Amazon FSx. For more information on using API’s to migrate your file share configuration, see this documentation.
If you have deployed Microsoft Distributed File System (DFS), you can streamline the cutover to the new Amazon FSx file system. To do so, change the DFS namespace to point the Amazon FSx file system instead of the Windows file server file system. If you would like to learn more about integrating Amazon FSx with Microsoft DFS, we have recorded a quick video on how to perform this configuration.
In this blog post, I shared how AWS can help customers who want to reduce the amount of time and money associated with maintaining their Windows file storage infrastructure. Those customers simply need to move to a cloud solution that offers a fully managed, native Windows file service solution. Amazon FSx for Windows File Server was created to meet these needs. I also highlighted how you could use AWS DataSync to migrate your data from your Windows file storage infrastructure to Amazon FSx. AWS DataSync automates the data migration tasks over prior methods of using Robocopy, WinRAR, or copying the files manually.
For more information, please see the following references:
Thanks for reading, please leave a comment if you have any questions regarding the solution outlined in the post.