AWS DataSync is an online data movement and discovery service that simplifies and accelerates data migrations to AWS and helps you move data quickly and securely between on-premises storage, edge locations, other clouds, and AWS Storage.
AWS DataSync Discovery helps you simplify migration planning and accelerate data migration to AWS by giving you visibility into your on-premises storage performance and utilization, and providing recommendations for migrating your data to AWS Storage services, such as Amazon FSx for NetApp ONTAP, Amazon FSx for Windows File Server, and Amazon Elastic File System (EFS). DataSync Discovery enables you to better understand your on-premises storage performance and capacity usage through automated data collection and analysis, enabling you to quickly identify data to be migrated and use generated recommendations to select AWS Storage services that align to your performance and capcity needs.
For online data transfers, AWS DataSync simplifies, automates, and accelerates copying large amounts of data between on-premises storage, edge locations, or other clouds, and AWS storage services, as well as between AWS storage services. DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, Google Cloud Storage, Azure Files, AWS Snowcone, Amazon S3 compatible storage on Snow, Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSx for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.
AWS DataSync provides the following features for data movement.
Purpose-Built Network Protocol
AWS DataSync employs an AWS-designed transfer protocol—decoupled from the storage protocol—to accelerate data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, maximizing performance over your Wide Area Network (WAN). A single DataSync task is capable fully utilizing 10 Gbps over a network link between your on-premises environment and AWS.
Bandwidth Optimization and Control
Transferring hot or cold data should not impede your business. DataSync is equipped with granular controls to optimize bandwidth consumptions. Throttle transfer speeds up to 10 Gbps during off hours and set limits when network availability is needed elsewhere.
Data Transfer Scheduling
DataSync comes with a built-in scheduling mechanism, allowing you to periodically run data transfer tasks to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI) without writing scripts to manage repeated transfers. Task scheduling automatically runs tasks on your configured schedule with hourly, daily, or weekly options provided directly in the AWS Console.
Data Encryption and Validation
All your data is encrypted in transit between the DataSync agent and the DataSync service using Transport Layer Security (TLS). DataSync supports using default at-rest encryption for Amazon S3 buckets. DataSync also supports encryption of data at rest and in transit for Amazon EFS and Amazon FSx.
DataSync ensures that your data arrives intact. For each transfer, the service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.
File System Integration and Metadata Preservation
The DataSync agent connects to your existing storage systems using the industry-standard NFS and SMB protocols, to your Hadoop cluster as an HDFS client, to your self-managed object storage or Google Cloud Storage using the Amazon S3 application programming interface (API), or to Azure Blob Storage using the Blob API (Preview). The agent transfers data rapidly and writes it into your designated Amazon S3 bucket, Amazon EFS file system, Amazon FSx for Windows File Server file system, or Amazon FSx file system.
File permissions and metadata are preserved when copying objects and or data between Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, or Amazon FSx for NetApp ONTAP.
When copying data to Amazon S3, DataSync automatically converts each file to a single S3 object in a 1:1 relationship, and preserves POSIX metadata from NFS shares or HDFS as Amazon S3 object metadata. When you copy objects containing file system metadata back to file formats, the original file metadata (that DataSync copied to S3) is restored.
Integration with AWS Infrastructure and Management Services
DataSync works natively with AWS security, monitoring, and audit services to simplify data movement and to provide a consistent management experience for your IT, storage, and DevOps teams. In addition to integrations with Amazon S3, Amazon EFS, and Amazon FSx, DataSync supports AWS Virtual Private Cloud (VPC) endpoints (powered by AWS PrivateLink) to move files directly into your Amazon VPC. Like other AWS services, you can use AWS Identity and Access Management (IAM) to securely manage DataSync access. Similarly, you can configure an IAM role to control the services accessing your Amazon S3 bucket.
Monitoring and Auditing with Amazon CloudWatch and AWS CloudTrail
With Amazon CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check previous data transfer history. With CloudWatch Metrics, you can see the number of files and amount of data copied. Consult CloudWatch Logs for information about individual files transferred at a given time, as well as the results of DataSync integrity verification. This simplifies monitoring, reporting, and troubleshooting, enabling you to provide timely updates to stakeholders. In addition, CloudWatch Events are triggered as your transfer tasks complete, enabling automation of dependent workflows. For audit purposes, you can consult AWS CloudTrail, which logs all actions performed by DataSync.
With AWS DataSync, you pay only for data copied by the service at a flat, per-gigabyte rate. No software licenses, contracts, maintenance fees, development cycles, or hardware are required. This provides a lower total cost of ownership (TCO) compared to manually building, operating, and optimizing your own high-performance scripted transfers, as well as lower total cost than buying and running commercial transfer tools.
Using AWS DataSync Discovery, you can run discovery jobs for up to 31 days and receive recommendations free of charge. DataSync Discovery keeps collected data and associated recommendations for 60 days following job completion.
DataSync Pricing is simple, and based on how much you data you transfer.
Instantly get access to the AWS Free Tier.
Get started building with DataSync in the AWS Console.