Q: What is AWS DataSync?
A: AWS DataSync is a managed online data transfer service that simplifies, automates, and accelerates moving and replicating large amounts of data between on-premises storage systems and AWS storage services such as Amazon S3 and Amazon EFS, over the Internet or AWS Direct Connect.
Q: Why should I use AWS DataSync?
A: AWS DataSync allows you to move, copy, and synchronize large datasets with millions of files, without having to build custom solutions with opensource tools, or license and manage expensive commercial network acceleration software. You can use DataSync for one-time migration of active data, periodic distribution for data processing workflows, or ongoing replication for business continuity.
Q: What problem does DataSync solve for me?
A: DataSync reduces the complexity and cost of online data transfer, making it simple to transfer datasets between on-premises storage systems and Amazon S3 or Amazon Elastic File System (EFS). DataSync connects to existing storage systems and data sources with a standard storage protocol (NFS), and uses a purpose-built network protocol and scale-out architecture to accelerate transfer to and from AWS. DataSync automatically scales and handles all of the tasks involved in moving data, monitoring the progress of transfers, encryption and validation of data transfers, and notifying customer of any failures. With DataSync you pay only for the amount of data copied, with no minimum commitments or upfront fees.
Q: Where can I transfer data to and from?
A: DataSync can copy data between on-premises storage systems and Amazon S3 and Amazon EFS. DataSync supports the NFS protocol to access on-premises storage.
Q: How do I get started with DataSync?
A: You can transfer data using DataSync with a few clicks in the AWS Management Console or through the AWS Command Line Interface (CLI). To get started, you deploy a DataSync agent into your IT environment, configure the source location and destination location, and initiate the copy.
Q: How do I deploy a DataSync agent?
A: You deploy a DataSync agent by downloading the OVA from the AWS Console and deploying to your on-premises VMware ESXi hypervisor. The agent must be deployed into your IT environment so that it can access your on-premises NFS server.
Q: What are the resource requirements for the DataSync agent?
A: You can find the minimum required resources per agent here.
Q: How do I use DataSync?
A: 1. Deploy an agent - Deploy a DataSync agent on-premises and associate it to your AWS account via the Management Console or API. The agent will be used to access your local NFS to read data from it or write to it. There is no need to deploy agents on EC2 or manage any in-cloud resources.
2. Create a data transfer task - Create a task by specifying the location of your data source and destination, and any options you want to use for to configure the transfer, such as copying file metadata.
3. Start the transfer - Start the task and monitor data movement in the console or with Amazon CloudWatch.
Q: How can I start a DataSync task?
A: DataSync copies data when you initiate a task via the AWS Management Console or CLI. Each time a task runs, it scans the source for changes, and performs a copy of any differences between the source to destination. You can configure which characteristics of the source are used to determine what changed, and if files or objects in the destination should be deleted if they are not found in the source.
Q: How does DataSync perform data integrity validation?
A: DataSync performs data integrity verification both during the transfer and at the end of the transfer (checksum comparison between source and destination files, as well as file metadata comparison).
Q: How can I monitor the status of data being transferred by DataSync?
A: You can use the AWS Management Console or CLI to monitor the status of data being transferred. Using Amazon CloudWatch Metrics, you can see the number of files and amount of data which has been copied. Amazon CloudWatch Logs are available for detailed error information. In addition, CloudWatch Events are triggered as your tasks transition state, enabling automation of dependent workflows. You can find additional information such transfer progress in the AWS Management Console or CLI.
Q: How does DataSync access my on-premises file system?
A: DataSync uses agents that you deploy into your IT environment to access your on-premises file systems through the NFS protocol. These agents connect to AWS using the Internet or your AWS Direct Connect, and are securely managed from the AWS Management Console or CLI. There is no need to setup a VPN/tunnel or allow inbound connections, and the agents can be configured to route through a firewall using standard network ports.
Q: How does DataSync access my Amazon S3 bucket?
A: DataSync accesses your Amazon S3 bucket using the IAM role you configure.
Q: How does DataSync access my Amazon EFS file system ?
A: DataSync accesses your Amazon EFS file system over the NFS protocol. It does so by mounting your file system from within your VPC using Elastic Network Interfaces (ENIs) managed by DataSync. DataSync fully manages the creation, use, and deletion of these ENIs on your behalf.
Q: When transferring files to or from Amazon S3, how does DataSync map between files and objects?
A: When files are copied to objects, there is a one-to-one relationship between a file and an object. Filesystem metadata is stored in S3 user metadata. Metadata is restored when objects that contain file system metadata are copied back to files.
Q: What metadata does DataSync preserve when transferring data?
A: DataSync preserves POSIX metadata such as user id, group id and permissions, ensuring files can be restored to their original location without loss of this metadata. Additionally, when storing POSIX metadata from files for objects in S3, the metadata is interoperable with File Gateway.
Q: Can I use versioning, lifecycle, cross-region replication, and S3 event notification with DataSync?
A: Yes. Your bucket policies for versioning, lifecycle management, cross-region replication, and S3 event notification apply directly to objects transferred to your bucket through DataSync.
When using versioning, note that changes to object metadata will create a new version of the object.
You can use S3 lifecycle policies to change an object's storage tier or delete old objects or object versions.
Q: Is my data encrypted while being transferred and stored?
A: Yes. All data transferred between the source and destination is encrypted via Transport Layer Security (TLS, which replaced Secure Sockets Layer, SSL). Data is never persisted in DataSync itself. The service supports using default encryption for S3 buckets and Amazon EFS file system encryption of data at rest.
Q: What happens if a DataSync task is interrupted?
A: If a task is interrupted (for instance, if the network connection goes down or the DataSync agent is restarted), the next run of the task will transfer missing files, and the data will be complete and consistent at the end of this run. Each time a task is started it performs an incremental copy, transferring only the changes from the source to the destination.
Q: Can I use DataSync with AWS Direct Connect?
A: Yes. You can use DataSync with your Direct Connect link to access public internet endpoints without any special configuration.
Q: How fast can DataSync copy my file system to AWS?
A:The rate at which DataSync can copy a given dataset is a function of amount of data, I/O bandwidth achievable from the source and destination storage, network bandwidth available, and network conditions. A single DataSync agent is capable of saturating a 10 Gbps network link.
Q: Can I control the amount of network bandwidth that a DataSync task uses?
A: Yes, you can control the amount of network bandwidth that DataSync will use by configuring the built-in bandwidth throttle. This can help to minimize impact on other users or applications who rely on the same network connection.
Q: Will DataSync affect the performance of my source file system?
A: Depending on the capacity of your on-premises file store, and the quantity and size of files to be transferred, DataSync may affect the response time of other clients when accessing the same source data store, because the agent reads or writes data from that storage system. Configuring a bandwidth limit for a task will reduce this impact by limiting the I/O against your storage system.
Q: Which compliance programs does DataSync support?
A: AWS has the longest-running compliance program in the cloud and are committed to helping customers navigate their requirements. DataSync has been assessed to meet global and industry security standards. It complies with PCI DSS, ISO 9001, 27001, 27017, and 27018), in addition to being HIPAA eligible. That makes it easier for you to verify our security and meet your own obligations. For more information and resources, visit our compliance pages. You can also go to the Services in Scope by Compliance Program page to see a full list of services and certifications.
Q: Is DataSync PCI compliant?
A: Yes. DataSync is PCI-DSS compliant, which means you can use it to transfer payment information. You can download the PCI Compliance Package in AWS Artifact to learn more about how to achieve PCI Compliance on AWS.
Q: Is DataSync HIPAA eligible?
A: Yes. DataSync is HIPAA eligible, which means if you have a HIPAA BAA in place with AWS, you can use DataSync to tranfer protected health information (PHI).
When to choose AWS DataSync
Q: How do I choose between AWS DataSync and the AWS Snowball Edge family?
A: AWS Snowball Edge is suited to customers who don’t need their data in AWS immediately, are bandwidth constrained, or transferring data from remote, disconnected or austere environments. DataSync is ideal for customers who need online migrations for active data sets, timely transfers for continuously generated data, or replication for business continuity.
Q: How do I choose between AWS DataSync and AWS Storage Gateway?
A: AWS Storage Gateway provides hybrid cloud storage capabilities, connecting on-premises applications to AWS storage services with low-latency access, and is used for backup, tiering, and local access to objects stored in S3. DataSync is used to rapidly copy data into or out of AWS storage services. You can use DataSync for fast transfer of existing data to Amazon S3, and the File Gateway configuration of Storage Gateway for subsequent low-latency access to that data from on-premises.
Q: How do I choose between AWS DataSync and S3 Transfer Acceleration?
A: If your applications are already integrated with the Amazon S3 API, and you want higher throughput for transferring large files to S3, you can use S3 Transfer Acceleration. If you want to transfer data from existing storage systems (e.g. Network Attached Storage), or from instruments that can’t be changed (e.g. DNA sequencers, video cameras), or if you want multiple destinations, you use DataSync.
AWS DataSync has simple, predictable, usage-based pricing; you pay only for the amount of data that you copy.
Instantly get access to the AWS Free Tier.
Get started building with AWS DataSync in the AWS Console.