AWS Storage Blog

AWS re:Invent recap: Quick and secure data migrations using AWS DataSync

I work with a lot of customers that are at different stages in their cloud journey. Some customers are just starting out, and looking for ways to take advantage of cloud scale and economics. Others have been running on AWS for years, and are looking to optimize their processes and workflows. There are also those that have a well-established cloud footprint, but must maintain on-premises systems that work together with their cloud-based applications.

Regardless of where customers are in their cloud journey, a theme I hear frequently is that data is growing exponentially. With this tremendous growth, customers are looking to get that data to the cloud for durable storage and to extract further business value. In particular, the continuing growth of unstructured file data is overwhelming traditional on-premises storage systems and customers are looking to bring their data to AWS to take advantage of our highly scalable cloud storage services.

Yesterday, I presented a re:Invent session – “Migrate your data to AWS quickly and securely using AWS DataSync” – focusing on fast and secure data migration using AWS DataSync. You can now watch that 30-minute session on-demand. In this blog, I provide details and background around DataSync, recapping my session at re:Invent 2020-2021.

AWS DataSync overview

AWS DataSync was purpose-built to help our customers migrate their data to AWS, securely and efficiently. DataSync takes care of much of the undifferentiated heavy lifting typically associated with migrating terabytes or petabytes of data and billions of files. As a fully managed service, DataSync provides a number of key features that ease the migration of unstructured file systems and object data to AWS. The DataSync service offers end-to-end data verification, encryption in flight, incremental transfers, scheduling, filtering, and a custom protocol optimized for delivering high performance. DataSync securely integrates with AWS Storage services such as Amazon S3Amazon EFS, and Amazon FSx for Windows File Server. It also provides monitoring and management via Amazon CloudWatchAWS CloudTrail, and the DataSync console. Best of all, at only $0.0125 per GiB of data transferred, DataSync is cost effective – you pay only for the data that you transfer and there is no minimum fee.

AWS DataSync can transfer data between your on-premises storage systems and AWS Storage services, or between AWS Storage services entirely in the cloud. To connect to your on-premises storage, DataSync uses agents, which you deploy as virtual machines (VM) in your VMware, Hyper-V, or KVM environments. The agents take care of transferring data between your on-premises storage and AWS Storage services. They can do so through the internet or using a private connection via AWS Direct Connect or AWS Virtual Private Network (VPN). Regardless of your network configuration, all data is encrypted in-flight using TLS.

AWS DataSync can transfer data between your on-premises storage systems and AWS, or between AWS Storage services entirely in the cloud

Figure 1: Using DataSync for on-premises or in-cloud data transfers

For data transfers between AWS Storage services, no agent is required. Simply specify your locations and start your task. The DataSync service will take care of deploying and managing all resources required for your data transfer task.

AWS DataSync also integrates with the AWS Snowcone device, which is a portable, ruggedized, secure device designed for gathering and processing data outside of traditional data centers. Using AWS Snowcone, you can write data to your device and then use the pre-installed DataSync agent to transfer your data to AWS over the network, without having to ship the device back to AWS.

Figure 2 - DataSync location configurations

Figure 2: DataSync location configurations

Our customers are choosing AWS DataSync for their migration projects not only because of the simplicity it provides, but also because of its deep integration with our storage services. When transferring data to Amazon S3, DataSync can copy directly to any S3 storage class, including S3 Glacier and S3 Glacier Deep Archive. With this capability, DataSync enables you to forgo the need (and associated costs) to lifecycle data from other storage classes. When using Amazon EFS or Amazon FSx for Windows File Server, DataSync securely copies all applicable metadata including permissions, ownership, and timestamps. Our customers want to get their data into AWS quickly and securely, and DataSync enables them to do so, at low cost and with minimal overhead.

Get started with AWS DataSync

Getting started with AWS DataSync is easy: if you are copying to or from your on-premises storage, simply deploy an agent in your virtual environment and then activate it from the DataSync console. You then create your locations, configure your task, and start the task execution. You can monitor your tasks directly from the DataSync console or by using Amazon CloudWatch.

Figure 3 - Getting started with DataSync

Figure 3: Getting started with DataSync

Key takeaways and conclusion

In this blog post and in my re:Invent session, I talked about how AWS DataSync can help you accelerate the migration of unstructured file and object data from your on-premises storage into AWS, as well as between AWS Storage services. DataSync has a number of features that enable you to move your data quickly and securely, including end-to-end validation, in-flight encryption, scheduling, filtering, and more. I also talked about DataSync’s deep integration with AWS Storage services and how easy it is to get started using the service.

If you would like to learn more about AWS DataSync, check out the following links:

If you are looking to get some hands-on experience with DataSync, we also have a video and some workshops on GitHub that walk you through various migration scenarios, step by step:

Feel free to leave any comments or questions about my re:Invent session or this blog post in the comments section. Thank you for reading!

Jeff Bartley

Jeff Bartley

Jeff is a Principal Solutions Architect at AWS, focused on Hybrid Cloud Storage and Data Transfer. He enjoys helping customers tackle their biggest storage challenges through cloud-scale architectures. A native of Southern California, Jeff loves to get outdoors whenever he can.