AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect. DataSync can copy data between Network File System (NFS) or Server Message Block (SMB) file servers, all Amazon Simple Storage Service (Amazon S3) storage classes, and Amazon Elastic File System (Amazon EFS) file systems.
Purpose-built network protocol
The service employs an AWS-designed transfer protocol—decoupled from storage protocol—to speed data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, maximizing performance over the Wide Area Network (WAN). A single DataSync agent is capable of saturating a 10 Gbps network link.
Automatic infrastructure management
DataSync auto-scales cloud resources to support higher-volume transfers, and makes it easy to add agents on-premises, if needed. This removes many of the infrastructure and management challenges that you face when either writing, optimizing and managing your own copy scripts, or deploying and tuning heavyweight commercial transfer tools.
Data encryption and validation
All of your data is encrypted in transit with Transport Layer Security (TLS). DataSync supports using default encryption for S3 buckets using Amazon S3-Managed Encryption Keys (SSE-S3), and Amazon EFS file system encryption of data at rest.
DataSync ensures that your data arrives intact. For each transfer, the service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.
Task scheduling enables you to configure periodically executing a task, to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI), without needing to write and run scripts to manage repeated transfers. Task scheduling automatically runs tasks on the schedule you configure, with hourly, daily, or weekly options provided directly in the Console. This allows you to use a single tool to manage and monitor your data transfer, and ensure that changes to your dataset are regularly copied to your destination storage.
File system integration and metadata preservation
The DataSync agent connects to your existing storage systems using the industry-standard NFS and SMB protocols. The agent transfers data rapidly and deposits it your designated Amazon S3 bucket or Amazon EFS file system.
When copying data to Amazon S3, DataSync automatically converts each file to be a single S3 object in a 1:1 relationship, and preserves POSIX metadata as Amazon S3 object metadata. When you copy objects that contain file system metadata back to file formats, the original file metadata that DataSync copied to S3 is restored. Similarly, when Amazon EFS is the destination for your data, DataSync preserves your existing directory structures and file metadata.
Integration with AWS infrastructure and management services
DataSync works natively with AWS infrastructure and management services to make data movement simpler, and to provide consistent management experience for your DevOps teams. In addition to integrations with Amazon S3, Amazon EFS, and AWS KMS, DataSync supports VPC endpoints (powered by AWS PrivateLink) in order to move files directly into your Amazon VPC. Like other AWS services, you use AWS Identity and Access Management (IAM) to securely manage access for DataSync. The service accesses your Amazon S3 bucket using an IAM role you configure.
Monitoring and auditing with Amazon CloudWatch and AWS CloudTrail
With Amazon CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check the history of previous data transfers. With CloudWatch Metrics you can see the number of files and amount of data that has been copied. You can consult CloudWatch Logs for more information about previous tasks. In addition, CloudWatch Events can be triggered as your transfer tasks complete, enabling automation of dependent workflows. For audit purposes, you can consult AWS CloudTrail, which logs all actions performed by DataSync.
You pay only for data copied by the service, at a flat, per-gigabyte rate—no software licenses, contracts, maintenance fees, development cycles, or required hardware. This provides a lower total cost of ownership compared to manually building, operating, and optimizing your own high-performance scripted transfers. It also offers lower total cost than buying and running commercial transfer tools.