AWS DataSync Documentation

AWS DataSync is an online data transfer service that is designed to simplify and accelerate copying large amounts of data between on-premises systems and AWS Storage services, as well as between AWS Storage services. DataSync can copy data between Network File System (NFS) shares or Server Message Block (SMB) shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems.

Purpose-built network protocol

The service employs an AWS-designed transfer protocol—decoupled from the storage protocol—to accelerate data movement. The protocol helps optimize how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
 
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, helping to increase performance over your Wide Area Network (WAN).

Infrastructure management

DataSync helps alleviate many of the infrastructure and management challenges that you face when either writing, optimizing, or managing your own copy scripts, or deploying and tuning heavyweight commercial transfer tools. DataSync comes with built-in monitoring and retry mechanisms, and allows granular control over the portion of network bandwidth used to transfer your data.

Data encryption and validation

DataSync is designed to encrypt data in transit with Transport Layer Security (TLS). DataSync supports using default encryption for S3 buckets, Amazon EFS file system encryption of data at rest, and Amazon FSx for Windows File Server encryption at rest and in transit.

For each transfer made using DataSync, the service is designed to perform integrity checks both in transit and at rest. These checks allow you to validate that the data written to your destination matches the data read from your source.

Data transfer scheduling

DataSync comes with a built-in scheduling mechanism that is designed to help you to periodically execute a data transfer task to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI), without needing to write scripts to manage repeated transfers. Task scheduling is designed to run tasks on the schedule you configure, with hourly, daily, or weekly options provided directly in the Console.

File system integration and metadata preservation

The DataSync agent is designed to connect to your existing storage systems using the industry-standard NFS and SMB protocols, or to your self-managed object storage, using the Amazon S3 API. The agent is designed to transfer data rapidly and write it into your designated Amazon S3 bucket, Amazon EFS file system, or Amazon FSx for Windows File Server file system.
 
When copying data between NFS shares and Amazon EFS, or between SMB shares and Amazon FSx for Windows File Server, file permissions and metadata are designed to be preserved, enabling an easy transition to the target file system. Similarly, when copying objects between self-managed object storage and Amazon S3, object metadata and tags are designed to be perserved.
 
When copying data to Amazon S3, DataSync is designed to convert each file to be a single S3 object in a 1:1 relationship, and preserve POSIX metadata from NFS shares as Amazon S3 object metadata. When you copy objects that contain file system metadata back to file formats, the original file metadata that DataSync copied to S3 is designed to be restored.

Integration with AWS infrastructure and management services

DataSync works natively with AWS security, monitoring, and audit services to make data movement simpler, and to provide consistent management experience for your IT, storage, and DevOps teams. In addition to integrations with Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, DataSync supports VPC endpoints (powered by AWS PrivateLink) in order to move files directly into your Amazon VPC. Like other AWS services, you can use AWS Identity and Access Management (IAM) to manage access for DataSync. Similarly, the service can access your Amazon S3 bucket using an IAM role you configure.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.