AWS Storage Blog

Simplify and accelerate your data migration using AWS DataSync Discovery

UPDATE (4/25/2023): DataSync Discovery is now generally available. For more information on additional capabilities and extended availability, view the What’s New post and visit the feature page.


Migrating your on-premises data to the cloud can be intimidating at first, particularly when you are working with large and complex storage systems. Estimating costs, understanding which data sets should be migrated for your applications, and knowing which cloud storage services to select, can require time and effort to determine on your own.

On September 21, 2022, AWS DataSync announced the public preview of a new feature, AWS DataSync Discovery, to help customers plan and implement their data migrations to AWS. This new feature uses automated data collection and analysis to give customers greater insights into their on-premises storage performance and capacity usage, helping them quickly identify data to be migrated, and providing recommendations to select AWS Storage services that align to their performance and budget needs. Customers use the recommendations automatically generated by DataSync Discovery to help inform their migration planning.

In this blog post, I explain how to get started using AWS DataSync Discovery and discuss how you can use the insights and recommendations made available by this new feature to help you accelerate your data migration to AWS.

How it works

AWS DataSync Discovery How it Works Diagram

DataSync Discovery uses a DataSync agent to connect to your on-premises storage and automatically collect information such as performance metrics and capacity utilization over time. DataSync Discovery uses the information collected to generate recommendations and estimated costs for AWS Storage services such as Amazon FSx for NetApp ONTAP, Amazon FSx for Windows File Server, and Amazon Elastic File System (EFS). You can then use these recommendations while planning your migration to AWS.

Getting started

To get started using DataSync Discovery, you first need to deploy a DataSync agent in your on-premises environment. A DataSync agent is a virtual machine that runs in your VMware, Hyper-V, or KVM environment. You can also run your agent as an Amazon EC2 instance in AWS if you have sufficient network connectivity, such as an AWS Direct Connect link, between your Amazon VPC and your on-premises environment where your storage system is located. Once your agent is deployed, you then activate your agent with the DataSync service. For the Preview launch, DataSync Discovery supports agent activation using public DataSync endpoints in the US East (N. Virginia) Region.

DataSync Discovery collects information from your on-premises storage using the management APIs provided by your storage system. This includes volume configuration, the number of clients connected, capacity usage, and performance metrics. DataSync Discovery does not access data on your file systems.

For Preview, DataSync Discovery supports NetApp FAS and AFF storage systems running ONTAP 9.8 or later, and uses the ONTAP REST API to securely access your storage system using credentials you provide through AWS Secrets Manager. AWS Secrets Manager helps you manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycles. Before you can use DataSync Discovery, you must first create a secret in Secrets Manager to store credentials and ensure access is enabled for the DataSync service.

With the DataSync agent deployed and activated, and a secret created to hold credentials for using the ONTAP REST API, you can now configure DataSync Discovery to access your storage system. You can do this through the DataSync console by clicking on “Discovery” and then adding a storage system.

Add storage system DataSync console screenshot

Here you provide information about your storage system, including the IP address or hostname of the management API interface, the secret to use for credentials, and the agent you created and activated earlier. You can also specify an Amazon CloudWatch log group and tags to associate with the storage system resource that is created.

After you have added your storage system, the agent verifies that it can connect using the IP address or hostname and credentials that you provided. This verification step can take up to five minutes.

My Storage system connecting to DataSync - console screenshot

Once verification is complete, you can then start a discovery job. A discovery job runs for up to 31 days and collects information about your storage system, including storage resources such as volumes and NFS/CIFS client counts, as well as performance and capacity utilization metrics. This collected information is used to generate recommendations automatically upon completion of the job. For accurate recommendations, we suggest running your discovery job for at least 14 days to get a representative sample of your storage performance over time. However, you can run for shorter periods to validate DataSync Discovery is working in your environment.

Start discovery job console screen shot

As the discovery job runs, collected information is made available for viewing in the AWS DataSync console, and also by using the DataSync CLI or SDK. CloudWatch logs provide details on the discovery job and enable you to find and address any errors if they occur.

Performance charts showing IOPS peaks and Throughput peaks

When the job is complete, recommendations are automatically generated. Recommendations include configurations and estimated monthly costs for applicable AWS Storage services. The recommended configurations are intended to provide the necessary performance for your storage resources at the lowest cost. For NetApp ONTAP systems, recommendations are provided on a per-volume basis.

Example recommendation and estimates output from discovery job

You should review the provided recommendations carefully to ensure they meet your unique storage needs.

Planning your migration

You can use the recommendations provided by DataSync Discovery to plan the migration of your data from your on-premises storage to AWS. Early in your planning process, you can use the monthly cost estimates provided with the recommendations to inform your budget planning. As you get closer to migrating your data, you can re-run your discovery jobs to confirm that early estimates are still valid, and adjust accordingly.

When you are ready to start your migration, you can use AWS DataSync to move your data online or AWS Snow Family devices for offline data movement.

Conclusion

Planning and migrating data to the cloud can be a significant undertaking, involving time and effort to collect information about your storage and then trying to map your storage to services in AWS. In this blog post, we showed you how AWS DataSync Discovery helps you accelerate your data migration and simplify your migration planning. You learned how DataSync Discovery works and how to get started using it to automate data collection and better understand your on-premises storage. You also learned how to use the insights and recommendations from your discovery jobs to inform your migration planning and begin moving your data to AWS.

To learn more about AWS DataSync Discovery, check out the following links: