AWS Storage Blog

Optimize file storage migration to AWS using AWS DataSync and Amazon FSx

When migrating from on premises to the cloud, a wide spectrum of customers face starkly different starting points. Some customers may have one or two workloads stored on premises, while others may have several storage arrays across several data centers, and others have even more intricate or vast setups. If you are migrating to the cloud, you may reasonably be perplexed with the complexity of assessing your current data storage environment and how to get started.

AWS Professional Services can guide you on your cloud journey with minimal downtime, ensuring you get the most out of your cloud solutions in the most efficient, impactful way possible. AWS can help you migrate your data to the cloud in the most optimal manner for your business, without having to redesign your applications, databases, or backup and disaster recovery (DR) stacks.

In this post, I cover how an international financial services company with more than 10,000 employees accelerated their journey to the cloud with an expeditious and secure online migration led by AWS Professional Services. In this particular case, the customer was able to migrate more than 100 TB of file storage quickly and seamlessly using AWS DataSync and Amazon FSx for Windows File Server (Amazon FSx).

Customer journey to the cloud

This particular customer previously ran their infrastructure on multiple storage arrays, including SAN, NAS, and object storage. They also had expensive Multiprotocol Label Switching (MPLS) circuits for their global connectivity from two data centers, which included an expensive multi-year agreement.

Business and technical challenges

The customer needed to shut down their two data centers and reduce costs. This cost consisted of proprietary technologies in addition to piecemeal technology from multiple vendors, which were expensive and difficult to manage due to the high operational overhead.

File storage migration presented its own challenges because of the intricate directory structure with millions of small files stored in deep subfolder hierarchies. Most of this financial services company’s data was unstructured, and their file storage spanned across native Microsoft Windows storage and SMB-based storage types, like NetApp, Dell EMC VNX, and Dell EMC VMAX.

Using AWS DataSync to migrate data file storage to Amazon FSx for Windows File server

Considering this particular customer’s storage workloads and future needs, AWS Professional Services, in conjunction with the customer, decided Amazon FSx – a fully managed, cost effective, highly reliable, and scalable storage solution – was the ideal choice. Amazon FSx is accessible over the industry standard Server Message Block (SMB) protocol that works on Windows, Linux, and macOS systems. Amazon FSx for Windows File Server integrates with your own on-premises Microsoft Active Directory in addition to AWS Microsoft Managed AD. With Amazon FSx’s native support for the SMB protocol, Windows-based applications have access to fully compatible shared file storage. As a fully managed storage service that takes care of the operational challenges of maintaining and running complex storage systems, patching, hardware refreshes, and upgrades, was perfect for this customer.

For this large-scale migration, AWS Professional Services leveraged AWS DataSync, an online data transfer and migration service that enables you to simplify, automate, and accelerate migrating PBs of file storage. With DataSync, you can migrate most storage arrays or conventional file servers, such as a Windows file server or a Linux-based NFS server, seamlessly to AWS.

Despite the millions of files in the customer’s complex folder hierarchy for their SMB shares, AWS DataSync was able to read and copy files to the target Amazon FSx for Windows File Server.

Read on for the setup and best practices the customer used during their migration. Some of the benefits of AWS DataSync that contributed to it being the ideal solution included:

  • High data throughput up to 10 Gbps
  • Full support for NFS and SMB
  • Securely transfer data to and from an encrypted private network using Direct Connect
  • Easy-to-use console and AWS CLI management
  • Time savings (takes only a few minutes to set up)
  • Pay-per-GB transfer
  • Amazon CloudWatch integration for monitoring and logging

Solution overview

For migrating on-premises SMB shares, the customer deployed (with the help of AWS Professional Services) an AWS DataSync agent as a virtual machine (VM) in their on-premises environment. Then, they defined a task to copy data from, from the source file system on premises to the Amazon FSx for Windows File Server. AWS DataSync is optimized for working with Amazon FSx for Windows File Server and scales to meet the performance needs of the migration workload.

The following high-level architecture diagram depicts file storage data migration to Amazon FSx for Windows File Server using AWS DataSync:

file storage data migration to Amazon FSx for Windows File Server using AWS DataSync

Prerequisites for AWS DataSync migration of data to Amazon FSx for Windows File Server

To migrate file storage to AWS FSx for Windows File Server using AWS DataSync, the following prerequisites must be in place:

  • An Amazon FSx file system created with appropriate permissions and joined to a self-managed Active Directory.
  • When creating SMB as a target location, ensure that you apply appropriate permissions and folder structure for the job to carry out successfully.
  • A Microsoft Active Directory user account used for setting up a DataSync task. The user must belong to an Active Directory group with rights to ‘Set Ownership’ on files and folder.
  • Networking requirements should be met for DataSync agents on the firewall and for the VPC interface endpoint security group.

High-level steps

The following overview explains the setup and configuration the customer used for AWS DataSync and Amazon FSx for Windows File Server:

  • The customer configured Amazon FSx shares with SSD-backed storage, a single Availability Zone, and custom throughput per Amazon FSx share, as per the workload and application requirements.
  • The customer deployed multiple AWS DataSync agents with 64-GB memory to support 50 million files per task.
  • The customer deployed AWS DataSync agents in both data centers to support data copy from multiple sources in parallel to speed up the data transfer process and save time.
  • The customer created a VPC interface endpoint for AWS DataSync to ensure that they transferred all their data over AWS Direct Connect from their data center to AWS.
  • Before running the task, the customer appropriately throttled bandwidth for the tasks over weekdays and weekends to use the AWS Direct Connect network without effecting production workloads.
  • The customer used multiple logical network interfaces (NIC) on the NetApp side (vFiler) to get higher throughput and avoid any contention on the NetApp when configuring it as the source location.
  • The customer kept an overhead of 300–400 GB per SMB share. This overhead is necessary as data size inflates when copied from NetApp, as it was originally in compressed and deduped format.
  • We recommended that customer enable deduplication on their Amazon FSx shares, reducing the overall usage capacity by 30–80 percent depending upon file type.

With this solution, the customer was able to migrate more than 100 TB of file server data over a single 1-Gbps AWS Direct Connect from their two data centers to the AWS Cloud. Ultimately, the customer did their entire storage migration online – in just four weeks.

Conclusion

In this blog, I walked through how an enterprise financial services company, close to renewing their two data center leases, quickly and seamlessly migrated multiple storage vendors along with complex legacy applications. AWS solved these storage migration challenges by using a combination of Amazon FSx for Windows File Server and AWS DataSync. These services enabled the customer to accelerate their data transfers without needing to write custom scripts, and without having to run CLI commands.

By successfully moving their storage and business application to AWS, this customer was able to decommission two data centers, saving them thousands of dollars monthly. They also reduced costs incurred from years of managing various storage, servers, and network vendors. With their file storage in Amazon FSx, the customer can enjoy fully managed storage with less operational overhead and costs, optimal for their business success.

Thanks for reading this blog! If you have any comments or questions, please don’t hesitate to leave them in the comments section.