Posted On: Nov 4, 2021

AWS DataSync now supports transferring data between Hadoop Distributed File Systems (HDFS) and Amazon S3, Amazon Elastic File System (EFS), or Amazon FSx for Windows File Server. Using DataSync, you can quickly, easily, and securely migrate files and folders from HDFS on your Hadoop cluster to AWS Storage. You can also use DataSync to replicate data on your Hadoop cluster to AWS for business continuity, copy data to AWS to populate your data lakes, or transfer data between your cluster and AWS for analysis and processing.

AWS DataSync is an online data transfer service that provides you with a simple way to automate and accelerate copying data over the internet or with AWS Direct Connect. DataSync is feature rich with built-in scheduling, monitoring, encryption, and data integrity validation. DataSync simplifies and automates the process of copying your data to and from AWS, all with pay-as-you-go pricing. In addition to support for HDFS, DataSync also supports copying data between Network File System (NFS) shares, Server Message Block (SMB) shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems. DataSync agents run external to your Hadoop cluster so you can accelerate your migrations and simplify data transfers between your cluster and AWS, without consuming compute and memory resources or impacting your business processes.

AWS DataSync is available in 23 AWS Regions. You can learn more about the service in the DataSync documentation, or you can log in to the AWS DataSync console to get started.