AWS DataSync

AWS DataSync FAQs

General
3
Data movement
8
Usage
15
Moving to and from AWS Storage
37
Performance
5
Security and compliance
9
When to choose AWS DataSync
5

General

Open all

AWS DataSync is an online data movement service that simplifies and accelerates data migrations to AWS as well as moving data to and from on-premises storage, other cloud providers, and AWS Storage services.

DataSync can copy data to and from Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, object storage in other clouds such as Google Cloud Storage and Wasabi Cloud Storage (see the full list of support clouds), Azure Files, Azure Blob Storage (including Azure Data Lake Storage Gen2), Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSx for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.

AWS DataSync enables you to move your data, securely and quickly. You can use DataSync to copy large datasets with virtually unlimited numbers of files, without having to build custom solutions with open-source tools, or license and manage expensive commercial network acceleration software. You can use DataSync to migrate data to AWS, archive data to free up on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the cloud for analysis and processing.

AWS DataSync reduces the complexity and cost of online data transfer, making it simple to transfer datasets to and from on-premises storage, other cloud providers and AWS Storage services. DataSync connects to existing storage systems and data sources with standard storage protocols (NFS, SMB), as an HDFS client, using the Amazon S3 API, or using other cloud storage APIs. It uses a purpose-built network protocol and scale-out architecture to accelerate data transfer between storage systems and AWS services. DataSync handles moving files and objects, scheduling data transfers, monitoring the progress of transfers, encryption, verification of data transfers, and notifying you of any issues.

Data movement

Open all

DataSync supports the following storage location types: Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, object storage in other clouds such as Google Cloud Storage and Wasabi Cloud Storage (see the full list of support clouds), Azure Files, Azure Blob Storage (including Azure Data Lake Storage Gen2), Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSx for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.

You can use AWS DataSync to migrate data located on premises, or in other clouds to Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, or Amazon FSx . Configure DataSync to make an initial copy of your entire dataset, and schedule subsequent incremental transfers of changing data until the final cut-over from on-premises to AWS. DataSync includes encryption and integrity validation to help make sure your data arrives securely, intact, and ready to use. To minimize impact on workloads that rely on your network connection, you can schedule your migration to run during off-hours, or limit the amount of network bandwidth that DataSync uses by configuring the built-in bandwidth throttle. DataSync preserves metadata between storage systems that have similar metadata structures, enabling a smooth transition of end users and applications to using your target AWS Storage service.

Read the storage blog, "Migrating storage with AWS DataSync," to learn more about migration best practices and tips.

You can use AWS DataSync to move cold data from on-premises storage systems directly to durable and secure long-term storage, such as Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) or Amazon S3 Glacier Deep Archive. Use DataSync’s exclude filters to exclude copying temporary files and folders or use include filters or manifests to copy only a subset of files from your source location. You can select the most cost-effective storage service for your needs: transfer data to an S3 storage class, or use DataSync with EFS Lifecycle Management to store data in Amazon EFS Infrequent Access storage class (EFS IA). Use the built-in task scheduling functionality to regularly archive data that should be retained for compliance or auditing purposes, such as logs, raw footage, or electronic medical records.

With AWS DataSync, you can periodically replicate files into Amazon S3, or send the data to Amazon EFS, or Amazon FSx for a standby file system. Use the built-in task scheduling functionality to ensure that changes to your dataset are regularly copied to your destination storage. Read this AWS Storage blog to learn more about data protection using AWS DataSync.

You can use AWS DataSync for ongoing transfers from on-premises systems into or out of AWS for processing. DataSync can help speed up your critical hybrid cloud storage workflows in industries that need to move active files into AWS quickly. This includes machine learning in life sciences, video production in media and entertainment, big data analytics in financial services, and seismic research in oil and gas. DataSync provides timely delivery to ensure dependent processes are not delayed. You can specify include and exclude filters or manifests to specify which files or objects should be transferred each time your task runs.

Yes. Using AWS DataSync, you can copy data from Google Cloud Storage using the Amazon S3 API, from Azure Files using the SMB protocol, or from Azure Blob Storage (including Azure Data Lake Storage Gen 2). You can also move data from other cloud storage such as Wasabi Cloud Storage, Oracle Cloud Storage, Cloudflare R2 Storage, DigitalOcean Spaces, and Backblaze B2 Cloud Storage (see the full list of support clouds). When using Enhanced mode tasks, no agent is required to connect to your cloud storage. Otherwise, if using Basic mode, deploy the DataSync agent in your cloud environment or on Amazon EC2. Then, create your source and destination locations and start your task to begin copying data. Learn more about AWS solutions for hybrid and multicloud environments.

Yes. With AWS DataSync, you can easily build your data lake, by automating the transfer of on-premises datasets or data in other clouds to Amazon S3. DataSync enables a simple and fast transfer of your entire data set using standard storage protocols (NFS, SMB), as an HDFS client, using the Amazon S3 API, or using other cloud storage APIs. After transferring your initial dataset, you can schedule subsequent transfers of new data to AWS. DataSync includes encryption and integrity validation to help make sure your data arrives securely, intact, and ready to use. To minimize impact on workloads that rely on your network connection, you can schedule transfer tasks to run during off-hours, or limit the amount of network bandwidth that DataSync uses by configuring the built-in bandwidth throttle. Once your data lands in Amazon S3, you can use native AWS services to run big data analytics, artificial intelligence (AI), machine learning (ML), high-performance computing (HPC) and media data processing applications to gain insights from your unstructured data sets. Read the AWS data lake storage web page to learn more about building and leveraging your data lake.

You can use DataSync to transfer files or objects between Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, or Amazon FSx for NetApp ONTAP within the same AWS account. You can transfer data between AWS services in the same AWS Region, between services in different Commercial AWS Regions except for China, or between AWS GovCloud (US-East and US-West) Regions. This does not require deploying a DataSync agent, and can be configured end to end using the AWS DataSync console, AWS Command Line Interface (CLI), or AWS Software Development Kit (SDK).

Usage

Open all

You can transfer data using AWS DataSync with a few clicks in the AWS Management Console or through the AWS Command Line Interface (CLI). To get started, follow these 3 steps:

1. To transfer data between on-premises and AWS Storage services, deploy an agent and associate it to your AWS account via the Management Console or API. The agent will be used to access your NFS server, SMB file share, Hadoop cluster, or self-managed object storage to read data from it or write data to it. Deploying an agent is not required to transfer data between other clouds and AWS, or between AWS Storage services within the same AWS account.

2. Create a data transfer task - Create a task by specifying the location of your data source and destination, and any options you want to use to configure the transfer, such as scheduling the task and enabling task reports.

3. Start the transfer - Start the task, monitor data movement in the console or with Amazon CloudWatch, and audit transfer tasks using task reports.

AWS DataSync supports two types of agents that correspond to different task modes: Basic and Enhanced. When copying data between your on-premises NFS or SMB file server and Amazon S3 using Enhanced mode, you need to use the DataSync Enhanced mode agent. For all other use cases, use the DataSync Basic mode agent.

You deploy an AWS DataSync agent to your on-premises hypervisor or in Amazon EC2. To copy data to or from your on-premises storage, you download the agent virtual machine image from the AWS Console and deploy to your on-premises VMware ESXi, Linux Kernel-based Virtual Machine (KVM), Nutanix AHV (using the KVM agent image), or Microsoft Hyper-V hypervisor. The agent must be deployed so that it can access your file server using the NFS, SMB protocol, access NameNodes and DataNodes in your Hadoop cluster, or access your object storage using the Amazon S3 API. To set up transfers between your S3 on AWS Outposts buckets and S3 buckets in AWS Regions, deploy the agent on your Outpost.

When copying data between your public cloud environment and AWS Storage using Basic mode, you can either deploy a DataSync agent in your cloud environment or on Amazon EC2. Because AWS DataSync compresses data in flight between the AWS DataSync agent and AWS Storage services, you may be able to reduce egress fees by deploying the AWS DataSync agent in your public cloud environment. When using Enhanced mode tasks, no agent is required to connect to your cloud storage.

Deploying an agent is not required to transfer data between AWS Storage services within the same AWS account. To copy data to or from a self-managed in-cloud file server, or between AWS Storage services in different AWS accounts, you launch an Amazon EC2 instance using a DataSync agent AMI.

You can find the minimum required resources to run the agent here.

AWS DataSync copies data when you initiate a task via the AWS Management Console or AWS Command Line Interface (CLI). Each time a task runs, it scans the source and destination for changes, and performs a copy of any data and metadata differences between the source to the destination. You can configure which characteristics of the source are used to determine what changed, define include and exclude filters or manifests to transfer specific file and object data, and control if files or objects in the destination should be overwritten when changed in the source or deleted when not found in the source.

A Basic mode task is subject to quotas on the number of files and objects in a dataset. Basic mode sequentially prepares, transfers, and verifies files and objects in a dataset, making it slower than Enhanced mode for most workloads. With Enhanced mode, you can transfer datasets with virtually unlimited numbers of objects at higher levels of performance than Basic mode. Enhanced mode tasks optimize and streamline the data transfer process by listing, preparing, transferring, and verifying data in parallel. You also get enhanced metrics and reporting capabilities, making it easier to track and manage large data transfers. Enhanced mode is currently available for transfers between Amazon S3 locations, between other clouds and Amazon S3, and between on-premises NFS or SMB file servers and Amazon S3. Basic mode supports all DataSync location types available today. See the DataSync documentation for a detailed list of differences between task modes. See the DataSync pricing page for differences in pricing between task modes.

As AWS DataSync transfers and stores data, it performs integrity checks to ensure the data written to the destination matches the data read from the source. Additionally, an optional verification check can be performed to compare source and destination at the end of the transfer. DataSync will calculate and compare full-file checksums of the data stored in the source and in the destination. You can check either the entire dataset or just the files or objects that DataSync transferred.

You can use task reports to audit your data transfer processes by verifying the transfer operations across all of your task executions. Using task reports, you can get a summary report along with detailed reports for all files transferred, skipped, verified, and deleted, for each task execution. Task reports give you the total number of files and bytes transferred, and include file attributes such as size, path, timestamps, file checksums, and object version IDs where applicable. You can also leverage AWS Glue, Amazon Athena, and Amazon QuickSight to automatically catalog, query, and visualize task reports to gain critical insights into your data transfer processes.

You can use the AWS Management Console or CLI to monitor the status and progress of data being transferred. Using Amazon CloudWatch Metrics, you can see the number of files and amount of data which has been copied. You can also enable logging of individual files to CloudWatch Logs, to identify what was transferred at a given time, as well as the results of the content integrity verification performed by DataSync.

These solutions together simplify auditing, monitoring, reporting, and troubleshooting, and enable you to provide timely updates to stakeholders.

Yes. You can specify an exclude filter, an include filter, or both to limit which files, folders, or objects are transferred each time a task runs. Alternatively, you can use manifests to specify a subset of files or objects that should be transferred from your source location.

Include filters specify the file and folder paths or object keys that should be included when the task runs and limits the scope of what is scanned by DataSync on the source and destination. Exclude filters specify the file and folder paths or object keys that should be excluded from being copied. When creating or updating a task, you can configure both exclude and include filters. When starting a task, you can override and update the filters configured on the task. Read this AWS storage blog to learn more about using common filters with DataSync.

A manifest is a CSV-formatted file that lists the file paths or object keys that should be included when the task runs and limits the scope of what is scanned by DataSync on the source and destination. When creating or updating a task, you can provide a manifest file with millions of source files or objects, and DataSync will only compare and transfer the files listed in the manifest. When starting a task, you can override and update the manifest file. When copying data from Amazon S3, you can also specify an optional S3 version ID of each object to transfer. Read this blog for more details.

Note that filters and manifests cannot be used together.

Whereas a manifest is an explicit list of files or objects to be transferred from the source location, an include filter is a string specifying patterns of files and folders to be transferred from the source. Only files and folders that match the patterns in the filter are copied. A pattern can be an entire file or folder path, or a prefix ending with a wildcard (*) character, indicating that all files or objects that match the prefix should be copied. Include filters are ideal for customers that only want to copy a small set of files or objects, or a few specific folders. Customers with well-known datasets, such as those moved as part of an automated workflow, can use manifests to avoid scanning their entire file or object storage systems to determine changes. Using a manifest file, customers can specify millions of source files or objects to be transferred, and DataSync will only compare the files listed in the manifest. Customers can also use manifests to copy specific versions of objects from their Amazon S3 bucket.

Yes. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI), without needing to write and run scripts to manage repeated transfers. Task scheduling automatically runs tasks on the schedule you configure, with hourly, daily, or weekly options provided directly in the Console. This enables you to ensure that changes to your dataset are automatically detected and copied to your destination storage.

Yes. When transferring files, AWS DataSync creates the same directory structure on the destination as on the source location's structure.

If a task is interrupted, for instance, if the network connection goes down or the AWS DataSync agent is restarted, the next run of the task will transfer missing files, and the data will be complete and consistent at the end of this run. Each time a task is started it performs an incremental copy, transferring only the changes from the source to the destination.

You can use AWS DataSync with your Direct Connect link to access public service endpoints or private VPC endpoints. When using VPC endpoints, data transferred between the DataSync agent and AWS services does not traverse the public internet or need public IP addresses, increasing the security of data as it is copied over the network.

Yes, VPC endpoints are supported for data movement use cases. You can use VPC endpoints to ensure data transferred between your AWS DataSync agent, either deployed on-premises or in-cloud, doesn't traverse the public internet or need public IP addresses. Using VPC endpoints increases the security of your data by keeping network traffic within your Amazon Virtual Private Cloud (Amazon VPC). VPC endpoints for DataSync are powered by AWS PrivateLink, a highly available, scalable technology that enables you to privately connect your VPC to supported AWS services.

To use VPC endpoints with AWS DataSync, you create a VPC endpoint for the DataSync service in your chosen VPC, and specify the endpoint when creating your DataSync agent. Your agent will connect the endpoint to activate, and subsequently all data transferred by the agent will remain within your VPC. You can use either the AWS DataSync Console, AWS Command Line Interface (CLI), or AWS SDK, to configure VPC endpoints. To learn more, see Using AWS DataSync in a Virtual Private Cloud.

Moving to and from AWS Storage

Open all

AWS DataSync supports moving data to, from, or between Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, and Amazon FSx for NetApp ONTAP.

Yes. When configuring an S3 bucket for use with AWS DataSync, you can select the S3 storage class that DataSync uses to store objects. DataSync supports storing data directly into S3 Standard, S3 Intelligent-Tiering, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), Amazon S3 Glacier Instant Retrieval, Amazon S3 Glacier Flexible Retrieval, and Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive). More information on Amazon S3 storage classes can be found in the Amazon Simple Storage Service Developer Guide.

Objects smaller than the minimum charge capacity per object will be stored in S3 Standard. For example, folder objects, which are zero-bytes in size and hold only metadata, will be stored in S3 Standard. Read about considerations when working with Amazon S3 storage classes in our documentation and evaluating S3 request costs when using DataSync. For more information on minimum charge capacities see Amazon S3 Pricing.

Yes. When using S3 as the source location for an AWS DataSync task, the service will retrieve all objects from the bucket which need to be copied to the destination. Retrieving objects from S3 Standard-IA and S3 One Zone-IA storage will incur a retrieval fee based on the size of the objects. Read about considerations when working with Amazon S3 storage classes in our documentation.

When using S3 as the source location for an AWS DataSync task, the service will attempt to retrieve all objects from the bucket which need to be copied to the destination. Retrieving objects which are archived in the S3 Glacier Instant Retrieval storage class will incur higher retrieval fees based on the size of the objects. Retrieving objects which are archived in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class results in an error. Any errors retrieving archived objects will be logged by DataSync and will result in a failed task completion status. Read about considerations when working with Amazon S3 storage classes and evaluating S3 request costs when using DataSync in our documentation.

AWS DataSync assumes an IAM role that you provide. The policy you attach to the role determines which actions the role can perform. DataSync can auto generate this role on your behalf or you can manually configure a role.

When files or folders are copied to Amazon S3, there is a one-to-one relationship between a file or folder and an object. File and folder timestamps and POSIX permissions, including user ID, group ID, and permissions, are stored in S3 user metadata. For NFS shares, file metadata stored in S3 user metadata is fully interoperable with File Gateway, providing on-premises file-based access to data stored in Amazon S3 by AWS DataSync.

When DataSync copies objects that contain this user metadata back to an NFS server, the file metadata is restored. Symbolic links and hard links are also restored when copying back from NFS to S3.

When copying from an SMB file share, default POSIX permissions are stored in S3 user metadata. When copying back to an SMB file share, ownership is set based on the user that was configured in DataSync to access that file share, and default permissions are assigned.

When copying from HDFS, file and folder timestamps, user and group ownership, and POSIX permissions are stored in S3 user metadata. When copying from Amazon S3 back to HDFS, file and folder metadata are restored.

Learn more about how DataSync stores files and metadata in our documentation.

When transferring objects between self-managed object storage or Azure Blob Storage and Amazon S3, DataSync copies objects together with object metadata and tags.

When transferring objects between Amazon S3 buckets, DataSync copies objects together with object metadata and tags. DataSync does not copy other object information such as object ACLs or prior object versions.

Some S3 storage classes have behaviors that can affect your cost, such as data retrieval, minimum storage capacities, and minimum storage durations. DataSync automates management of data to address these factors, and provides settings to minimize data retrieval.

To avoid minimum capacity charge per object, AWS DataSync automatically stores small objects in S3 Standard. To minimize data retrieval fees, you can configure DataSync to verify only files that were transferred by a given task. To avoid minimum storage duration charges, DataSync has controls for overwriting and deleting objects. Read about cost considerations when working with Amazon S3 storage classes in our documentation and evaluating S3 request costs when using DataSync.

Yes. You can copy objects between Amazon S3 on AWS Outposts and Amazon S3 buckets in AWS Regions. AWS DataSync copies objects together with object metadata and object tags. For DataSync to access your Amazon S3 on Outposts buckets, deploy a DataSync EC2 agent on your Outpost.

When using DataSync with Amazon S3 on Outposts, you can only transfer data to and from Amazon S3 buckets in AWS Regions. You can learn more about supported sources and destinations for DataSync tasks in our documentation.

AWS DataSync accesses your Amazon EFS file system using the NFS protocol. The DataSync service mounts your file system from within your VPC from Elastic Network Interfaces (ENIs) managed by the DataSync service. DataSync fully manages the creation, use, and deletion of these ENIs on your behalf. You can choose to mount your EFS file system using a mount target or an EFS Access Point.

Yes. You can use AWS DataSync to copy files into Amazon EFS and configure EFS Lifecycle Management to migrate files that have not been accessed for a set period of time to the Infrequent Access (IA) storage class.

You can use both IAM identity policies and resource policies to control client access to Amazon EFS resources in a way that is scalable and optimized for cloud environments. When you create a DataSync location for your EFS file system, you can specify an IAM role that DataSync will assume when accessing EFS. You can then use EFS file system policies to configure access for the IAM role. Because DataSync mounts EFS file systems as the root user, your IAM policy must allow the following action: elasticfilesystem:ClientRootAccess.

Yes. In addition to the built-in replication provided by Amazon EFS, you can also use AWS DataSync to schedule periodic replication of your Amazon EFS file system to a second Amazon EFS file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

AWS DataSync copies file and folder timestamps and POSIX permissions and applies default values for user ID and group ID. You can learn more and see the complete list of copied metadata in our documentation.

AWS DataSync accesses your Amazon FSx for Windows File Server file system using the SMB protocol, authenticating with the username and password you configure in the AWS Console or CLI. The DataSync service mounts your file system from within your VPC from Elastic Network Interfaces (ENIs) managed by the DataSync service. DataSync fully manages the creation, use, and deletion of these ENIs on your behalf.

Yes. You can use AWS DataSync to schedule periodic replication of your Amazon FSx for Windows File Server file system to a second file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

When you create a DataSync task to copy to or from your FSx for Lustre file system, the DataSync service will create Elastic Network Interfaces (ENIs) in the same VPC and subnet where your file system is located. DataSync uses these ENIs to access your FSx for Lustre file system using the Lustre protocol as the root user. When you create a DataSync location resource for your FSx for Lustre file system, you can specify up to five security groups to apply to the ENIs and configure outbound access from the DataSync service. The security groups must be configured to allow outbound traffic on the network ports required by FSx for Lustre. The security groups on your FSx for Lustre file system should be configured to allow inbound access from the security groups you assigned to the DataSync location resource for your FSx for Lustre file system.

Yes. You can use AWS DataSync to copy from your FSx for Lustre file system to a second file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

Yes. You can use AWS DataSync to schedule periodic replication of your Amazon FSx for Lustre file system to a second file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

No. Files are written using the file layout and striping configuration on the destination’s file system.

When you create a DataSync task to copy to or from your FSx for OpenZFS file system, the DataSync service will create Elastic Network Interfaces (ENIs) in the same VPC and subnet where your file system is located. DataSync uses these ENIs to access your FSx for OpenZFS file system using the OpenZFS protocol as the root user. When you create a DataSync location resource for your FSx for OpenZFS file system, you can specify up to five security groups to apply to the ENIs and configure outbound access from the DataSync service. The security groups must be configured to allow outbound traffic on the network ports required by FSx for OpenZFS. The security groups on your FSx for OpenZFS file system should be configured to allow inbound access from the security groups you assigned to the DataSync location resource for your FSx for OpenZFS file system.

Yes. You can use AWS DataSync to copy from your FSx for OpenZFS file system to a second file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

Yes. You can use AWS DataSync to schedule periodic replication of your Amazon FSx for OpenZFS file system to a second file system within the same AWS account. This capability is available for both same-region and cross-region deployments, and does not require using a DataSync agent.

When you create a task, DataSync creates Elastic Network Interfaces (ENIs) in the Preferred Subnet of the same VPC where your Amazon FSx for NetApp ONTAP file system is located. The Preferred Subnet is configured when you create your FSx for ONTAP file system, and DataSync uses the ENIs it creates in that subnet to access your FSx for ONTAP file system. When you create a DataSync Location resource for your FSx for ONTAP file system, you can specify up to 5 security groups to apply to the ENIs to configure outbound access from the DataSync service. You should configure the security groups on your FSx for ONTAP file system to allow inbound access from the security groups you assigned to the DataSync Location resource for your FSx for ONTAP file system .

AWS DataSync supports using NFSv3, SMB 2.1, and SMB 3. DataSync does not currently support using NFSv4 or above with FSx for ONTAP.

Yes, AWS DataSync copies file and folder timestamps and POSIX permissions, including user ID, group ID, and permissions, when using the NFS protocol. When using the SMB protocol, DataSync copies file and folder timestamps, ownership, and ACLs. You can learn more and see the complete list of copied metadata in our documentation.

When migrating from Windows servers or NAS shares that serve users through the SMB protocol, use a DataSync SMB source location and the SMB protocol for your FSx for ONTAP location, ensuring that the security style for your FSx for ONTAP volume is configured for NTFS. When migrating from Unix or Linux servers or NAS shares that serve users through the NFS protocol, use a DataSync NFS source location and the NFS protocol for your FSx for ONTAP location, ensuring the security style for your FSx for ONTAP volume is configured for Unix. For multi-protocol migrations, you should review the best practices covered in the blog Enabling multiprotocol workloads with Amazon FSx for NetApp ONTAP, and use the SMB protocol to preserve file system metadata with the highest fidelity. For more information on configuring security styles for your FSx for ONTAP volumes, see the documentation on managing FSx for ONTAP volumes.

Yes, however you will need to create a separate DataSync location and task resource for each protocol (NFS or SMB). To avoid issues with overwriting data and data verification, we do not recommend using multiple DataSync tasks to copy to the same volume path at the same time (whether using the same protocol or different protocols).

No, DataSync only supports copying file data to or from FSx for ONTAP volumes using NFS or SMB protocols.

Yes. You can use AWS DataSync to copy from your FSx for ONTAP file system to a second file system within the same AWS account. This capability is available for both same-Region and cross-Region deployments, and does not require using a DataSync agent.

While DataSync can be used to replicate data between your file systems, we recommend using NetApp SnapMirror to replicate between your FSx for ONTAP file systems. SnapMirror enables you to achieve low RPOs, regardless of the number or size of files in your file system.

DataSync will automatically exclude folders named “.snapshot”. You can also use exclude filters to avoid copying files and folders that match patterns you specify.

Performance

Open all

The rate at which AWS DataSync can copy a given dataset is a function of amount of data, I/O bandwidth achievable from the source and destination storage, network bandwidth available, and network conditions. For data transfer between on premises and AWS Storage services, a single DataSync task is capable of fully utilizing a 10 Gbps network link.

Yes. You can control the amount of network bandwidth that AWS DataSync will use by configuring the built-in bandwidth throttle. You can increase or decrease this limit while your data transfer task is running. This enables you to minimize impact on other users or applications who rely on the same network connection.

AWS DataSync generates Amazon CloudWatch Metrics to provide granular visibility into the transfer process. Using these metrics, you can see the number of files and amount of data which has been copied and verified. You can see CloudWatch Graphs with these metrics directly in the DataSync Console.

Depending on the capacity of your on-premises file store, and the quantity and size of files to be transferred, AWS DataSync may affect the response time of other clients when accessing the same source data store, because the agent reads or writes data from that storage system. Configuring a bandwidth limit for a task will reduce this impact by limiting the I/O against your storage system.

When performing a large migration, you can partition your dataset using multiple DataSync tasks. Partitioning your data across multiple tasks lets you run your data transfers in parallel and reduce your migration timelines. For best practices on scaling your DataSync transfers, see How to accelerate your data transfers with AWS DataSync scale out architectures.

Security and compliance

Open all

Yes. All data transferred between the source and destination is encrypted via Transport Layer Security (TLS), which replaced Secure Sockets Layer (SSL). Data is never persisted in AWS DataSync itself. The service supports using default encryption for S3 buckets, Amazon EFS file system encryption of data at rest, and Amazon FSx encryption at rest and in transit.

AWS DataSync uses an agent that you deploy into your IT environment or into Amazon EC2 to access your Hadoop cluster. The DataSync agent acts as an HDFS client and communicates with the NameNodes and DataNodes in your clusters. When you start a task, DataSync queries the primary NameNode to determine the locations of files and folders on the cluster. DataSync then communicates with the DataNodes in the cluster to copy files and folders to, or from, HDFS.

AWS DataSync uses the Amazon S3 API to access your S3-compatible object storage systems. To access your on-premises object storage, AWS DataSync uses an agent that you deploy into your data center. When using Basic mode tasks for cross-cloud transfers, DataSync uses an agent you deploy in your public cloud environment, or into Amazon EC2 to access your storage in other clouds. This agent connects to DataSync service endpoints within AWS, and is securely managed from the AWS Management Console or CLI. When using Enhanced mode tasks, no agent is required to connect to storage in other clouds.

When using Basic mode tasks, AWS DataSync uses an agent that you deploy into your Azure environment or into Amazon EC2 to access objects in your Azure Blob Storage containers. The agent connects to DataSync service endpoints within AWS, and is securely managed from the AWS Management Console or CLI. When using Enhanced mode tasks, no agent is required to connect to your Azure Blob Storage. DataSync authenticates to your Azure container using a SAS token that you specify when creating a DataSync Azure Blob location.

No. When copying data to or from your premises, there is no need to setup a VPN/tunnel or allow inbound connections. Your AWS DataSync agent can be configured to route through a firewall using standard network ports. You can also deploy DataSync within your Amazon Virtual Private Cloud (Amazon VPC) using VPC endpoints. When using VPC endpoints, data transferred between the DataSync agent and AWS services does not need to traverse the public internet or need public IP addresses.

Your AWS DataSync agent connects to DataSync service endpoints within your chosen AWS Region. You can choose to have the agent connect to public internet facing endpoints, Federal Information Processing Standards (FIPS) validated endpoints, or endpoints within one of your VPCs. Activating your agent securely associates it with your AWS account. To learn more, see Choose a Service Endpoint and Activate Your Agent.

Updates to the agent VM, including both the underlying operating system and the AWS DataSync software packages, are automatically applied by AWS once the agent is activated. Updates are applied non-disruptively when the agent is idle and not executing a data transfer task.

AWS has the longest-running compliance program in the cloud. AWS is committed to helping customers navigate their requirements. AWS DataSync has been assessed to meet global and industry security standards. DataSync complies with PCI DSS, ISO 9001, 27001, 27017, and 27018; SOC 1, 2, and 3; in addition to being HIPAA eligible. DataSync is also authorized in the AWS US East/West Regions under FedRAMP Moderate and in the AWS GovCloud (US) Regions under FedRamp High. That makes it easier for you to verify our security and meet your own obligations. For more information and resources, visit our compliance pages. You can also go to the Services in Scope by Compliance Program page to see a full list of services and certifications.

When to choose AWS DataSync

Open all

AWS DataSync fully automates and accelerates moving large active datasets to AWS. It is natively integrated with Amazon S3, Amazon EFS, Amazon FSx, Amazon CloudWatch, and AWS CloudTrail, which provides seamless and secure access to your storage services, as well as detailed monitoring of the transfer.

DataSync uses a purpose-built network protocol and scale-out architecture to transfer data. For data transfer between on premises and AWS Storage services, a single DataSync task is capable of fully utilizing a 10 Gbps network link.

DataSync fully automates the data transfer. It comes with retry and network resiliency mechanisms, network optimizations, built-in task scheduling, auditing via task reports, monitoring via the DataSync API and Console, and CloudWatch metrics, events and logs that provide granular visibility into the transfer process. DataSync performs data integrity verification both during the transfer and at the end of the transfer.

DataSync provides end-to-end security, and integrates directly with AWS storage services. All data transferred between the source and destination is encrypted via TLS, and access to your AWS storage is enabled via built-in AWS security mechanisms such as IAM roles. DataSync with VPC endpoints are enabled to ensure that data transferred between an organization and AWS does not traverse the public internet, further increasing the security of data as it is copied over the network.

AWS provides multiple tools to copy objects between your buckets.

Use AWS DataSync for ongoing data distribution, data pipelines, and data lake ingest, as well as for consolidating or splitting data between multiple buckets.

Use S3 Replication for continuous replication of data to a specific destination bucket.

Use S3 Batch Operations for large-scale batch operations on S3 objects, such as to copy objects, set object tags or access control lists (ACLs), initiate object restores from Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier), invoke an AWS Lambda function to perform custom actions using your objects, manage S3 Object Lock legal hold, or manage S3 Object Lock retention dates.

Use AWS DataSync to migrate existing data to Amazon S3, and subsequently use the File Gateway configuration of AWS Storage Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications.

You can use a combination of DataSync and File Gateway to minimize your on-premises infrastructure while seamlessly connecting on-premises applications to your cloud storage. AWS DataSync enables you to automate and accelerate online data transfers to AWS Storage services. After the initial data transfer phase using AWS DataSync, File Gateway provides your on-premises applications with low latency access to the migrated data. When using DataSync with NFS shares, POSIX metadata from your source on-premises storage is preserved, and permissions from the source storage apply when accessing your files using File Gateway.

If your applications are already integrated with the Amazon S3 API, and you want higher throughput for transferring large files to S3, you can use S3 Transfer Acceleration. If you want to transfer data from existing storage systems (e.g., Network Attached Storage), or from instruments that cannot be changed (e.g., DNA sequencers, video cameras), or if you want multiple destinations, you use AWS DataSync. DataSync also automates and simplifies the data transfer by providing additional functionality, such as built-in retry and network resiliency mechanisms, data integrity verification, and flexible configuration to suit your specific needs, including bandwidth throttling, etc.

If you currently use SFTP to exchange data with third parties, AWS Transfer Family provides a fully managed SFTP, FTPS, FTP, and AS2 transfer directly into and out of Amazon S3, while reducing your operational burden.

If you want an accelerated and automated data transfer between NFS servers, SMB file shares, Hadoop clusters, self-managed or cloud object storage, Amazon S3, Amazon EFS, and Amazon FSx, you can use AWS DataSync. DataSync is ideal for customers who need online migrations for active data sets, timely transfers for continuously generated data, or replication for business continuity.

Get started

None

Explore features

None

Check out customer stories

Documentation

AWS DataSync FAQs

General

Data movement

Usage

Moving to and from AWS Storage

Performance

Security and compliance

When to choose AWS DataSync

Get started

Explore features

Check out customer stories

Read the blog

Learn

Resources

Developers

Help

AWS DataSync FAQs

General

What is AWS DataSync?

Why should I use AWS DataSync?

What problem does AWS DataSync solve for me?

Data movement

Where can I move data to and from?

How do I use AWS DataSync to migrate data to AWS?

How do I use AWS DataSync to archive cold data?

How do I use AWS DataSync to replicate data to AWS for business continuity?

How do I use AWS DataSync for recurring transfers between on-premises and AWS for ongoing workflows?

Can I use AWS DataSync to copy data from other clouds to AWS?

Can I use AWS DataSync to build my data lake?

How do I use AWS DataSync to transfer data between AWS Storage services?

Usage

How do I get started moving my data with AWS DataSync?

How do I deploy an AWS DataSync agent?

What are the resource requirements for the AWS DataSync agent?

How do I start an AWS DataSync data transfer task?

What is the difference between Basic mode and Enhanced mode tasks?

How does AWS DataSync ensure my data is copied correctly?

How can I audit and monitor the status of data being transferred by AWS DataSync?

Can I filter the files and folders that AWS DataSync transfers?

How is using a manifest file different from using include filters?

Can I configure AWS DataSync to transfer on a schedule?

Does AWS DataSync preserve the directory structure when copying files?

What happens if an AWS DataSync task is interrupted?

Can I use AWS DataSync with AWS Direct Connect?

Does AWS DataSync support VPC endpoints using AWS PrivateLink?

How do I configure AWS DataSync to use VPC endpoints?

Moving to and from AWS Storage

Which AWS Storage services are supported by AWS DataSync?

Can I copy my data into Amazon S3 Glacier Instant Retrieval, Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier), Amazon S3 Glacier Deep Archive, or other S3 storage classes?

Can I copy data out of S3 Standard-IA and S3 One Zone-IA storage classes?

Can I copy data out of Amazon S3 Glacier Instant Retrieval. Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) and Amazon S3 Glacier Deep Archive?

How does AWS DataSync access my Amazon S3 bucket?

How does AWS DataSync convert files and folders to or from objects in Amazon S3?

What object metadata is preserved when transferring objects between self-managed object storage or Azure Blob Storage and Amazon S3?

What object metadata is preserved when transferring objects between Amazon S3 buckets?

Which Amazon S3 request and storage costs apply when using S3 storage classes with AWS DataSync?

Can I copy object data to and from Amazon S3 buckets on AWS Outposts?

How does AWS DataSync access my Amazon EFS file system?

Can I use AWS DataSync with all Amazon EFS storage classes?

How do I use AWS DataSync with Amazon EFS file system resource policies?

Can I use AWS DataSync to replicate my Amazon EFS file system to a different AWS Region?

What metadata is preserved when copying data between an NFS share and Amazon EFS, or between two Amazon EFS file systems?

What metadata is preserved when copying data between HDFS and Amazon EFS?

How does AWS DataSync access my Amazon FSx for Windows File Server file system?

What Windows metadata is transferred when copying between an SMB share to Amazon FSx for Windows File Server file system, or between two Amazon FSx file systems?

Can I use AWS DataSync to replicate my Amazon FSx for Windows File Server file system to a different AWS Region?

How does AWS DataSync access my Amazon FSx for Lustre file system?

What metadata is preserved when either copying data between an NFS share or Amazon EFS file system and Amazon FSx for Lustre, or between two Amazon FSx for Lustre file systems?

Can I use AWS DataSync to migrate data from one FSx for Lustre file system to another?

Can I use AWS DataSync to replicate my Amazon FSx for Lustre file system to a different AWS Region?

Will DataSync copy the striping or layout settings when copying from one Amazon FSx for Lustre file system to another?

How does AWS DataSync access my Amazon FSx for OpenZFS file system?

What metadata is preserved when either copying data between an NFS share or Amazon EFS file system and Amazon FSx for OpenZFS, or between two Amazon FSx for OpenZFS file systems?

Can I use AWS DataSync to migrate data from one FSx for OpenZFS file system to another?

Can I use AWS DataSync to replicate my Amazon FSx for OpenZFS file system to a different AWS Region?

How does AWS DataSync access my Amazon FSx for Netapp ONTAP file system?

Which protocol versions can AWS DataSync use with Amazon FSx for NetApp ONTAP?

Does AWS DataSync preserve file system metadata when copying data to or from my Amazon FSx for NetApp ONTAP file system?

Which protocol should I use when migrating my data to Amazon FSx for NetApp ONTAP?

Can I use AWS DataSync to access the same Amazon FSx for NetApp ONTAP file system using different protocols?

Can I use AWS DataSync to transfer data to or from Amazon FSx for NetApp ONTAP iSCSI LUNs?

Can I use AWS DataSync to copy data from one Amazon FSx for NetApp ONTAP file system to another?

Can I use AWS DataSync to replicate my Amazon FSx for NetApp ONTAP file system to a different file system in another AWS Region?

How do I configure AWS DataSync to not copy snapshot directories?

Performance

How fast can AWS DataSync copy my file system to AWS?

Can I control the amount of network bandwidth that an AWS DataSync task uses?

How can I monitor the performance of AWS DataSync?

Will AWS DataSync affect the performance of my source file system?

How do I scale data transfers with AWS DataSync?

Security and compliance

Is my data encrypted while being transferred and stored?

How does AWS DataSync access my NFS server or SMB file share?

How does AWS DataSync access HDFS on my Hadoop cluster?

How does AWS DataSync access my self-managed or cloud object storage that supports the Amazon S3 protocol?

How does AWS DataSync access my Azure Blob Storage containers?