AWS DataSync is a managed data transfer service that simplifies and speeds up moving large amounts of data between on-premises storage systems and AWS storage services. With DataSync, you deploy a DataSync agent locally as a virtual machine to connect to your existing storage array or file system over the Network File Storage (NFS) protocol, and that agent sends and receives data to and from the fully managed DataSync service in AWS.
Parallel, scalable architecture
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, parallelizing large files to improve performance over the Wide Area Network (WAN). DataSync also auto-scales cloud resources to support higher-volume transfers, and makes it easy to add agents on-premises if needed. The service can scale performance to fully utilize a 10 Gbps network link.
Purpose-built network protocol
The service employs an AWS-designed transfer protocol—decoupled from storage protocol—to speed data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Some of the protocol’s network optimizations include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
Automatic infrastructure management
After you deploy the DataSync agent, connect it your on-premises storage, and select your Amazon S3 bucket or EFS file system as the destination or source for data transfers, the DataSync service manages the rest of the infrastructure, including automatically scaling in-cloud resources. It also makes it easy to add and manage more agents on-premises. This removes many of the infrastructure and management challenges that you face when either writing, optimizing and managing your own copy scripts, or deploying and tuning heavyweight commercial transfer tools.
File system integration and metadata preservation
The DataSync agent connects to your existing storage systems using the industry-standard NFS protocol. The agent transfers data rapidly and deposits it your designated Amazon S3 bucket or Amazon EFS file system. When copying data to Amazon S3, DataSync automatically converts each file to be a single S3 object in a 1:1 relationship, and preserves file metadata as Amazon S3 object metadata. When you copy objects that contain file system metadata back to file formats, the original file metadata that DataSync copied to S3 is restored. Similarly, when Amazon EFS is the destination for your data, DataSync preserves your existing directory structures and file metadata.
Data encryption and validation
All of your data is encrypted in transit with Transport Layer Security (TLS). DataSync integrates with AWS Key Management Service (AWS KMS) so you can encrypt data at rest in AWS. For data in Amazon S3, you can also use Amazon S3-Managed Encryption Keys (SSE-S3). DataSync also makes sure that your data arrives intact. For each transfer, service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.
You pay only for data copied by the service, at a flat, per-gigabyte rate—no software licenses, contracts, maintenance fees, development cycles or required hardware. This provides a lower total cost of ownership than trying to manually build, operate, and optimize your own high-performance scripted transfers. It also offers lower total cost than buying and running commercial transfer tools.
Integration with AWS infrastructure and management services
DataSync works natively with AWS infrastructure and management services to make data movement simpler, and to provide consistent management experience for your DevOps teams. In addition to the integrations with Amazon EFS, Amazon S3 and AWS KMS noted above, DataSync connects with Amazon VPC in order to move files directly into your EFS file system inside your VPC. Like other AWS services, you use AWS Identity and Access Management (IAM) to securely manage access for DataSync. The service accesses your Amazon S3 bucket using an IAM role you configure.
Monitoring and auditing with Amazon CloudWatch and AWS CloudTrail
With Amazon CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check the history of previous data transfers. With CloudWatch Metrics you can see the number of files and amount of data which has been copied. You can consult CloudWatch Logs for more information about previous tasks. In addition, CloudWatch Events can be triggered as your transfer tasks complete, enabling automation of dependent workflows. For audit purposes, you can consult AWS CloudTrail which logs all actions performed by DataSync.