AWS Storage Blog
Migrating Wasabi Object Storage to Amazon S3 using AWS DataSync
Many organizations find themselves faced with the task of transferring a substantial amount of object data between cloud service providers, there are various scenarios behind such transfers. These scenarios include data consolidation, workload migration, acquisition, and cost optimization efforts.
Achieving a successful migration involves several crucial components, including comprehensive data encryption during transfer, the ability to only transfer incremental changes, verifying the integrity of objects in transit, managing network utilization, and implementing cost-effective monitoring solutions. Nonetheless, creating systems that satisfy these prerequisites can be a resource-intensive and expensive endeavor, often lacking scalability. Additionally, transitioning between public cloud providers may impose constraints, such as limited access to data center storage devices.
In this post, we lead you through the process of transferring object data from Wasabi Cloud Storage, a provider of Amazon S3-compatible object storage, to Amazon Simple Storage Service (S3) using AWS DataSync. DataSync is a powerful service that offers a range of built-in features to facilitate data transfers between different storage systems, whether on-premises or in the cloud. Its network optimization, data validation, and scheduling capabilities enable users to perform transfers quickly, accurately, and efficiently. In moving data to AWS, you can leverage AWS’s unmatched experience, maturity, reliability, security, and performance, which you can depend upon for your most important applications.
AWS DataSync overview
AWS DataSync is a secure online service that automates and accelerates moving file and object data to and from on-premises storage, other clouds, and AWS Storage services. AWS DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSz for OpenZFS file systems, Amazon FSx for NetApp ONTAP file systems and other Clouds.
How to transfer data from Wasabi Cloud Storage to Amazon S3 with AWS DataSync
This guide walks you through the process of configuring AWS DataSync to transfer files from Wasabi to Amazon S3. The architecture of AWS DataSync for this tutorial, illustrated in Figure 1, utilizes a single AWS DataSync agent. The agent acts as a client between your source data in Wasabi and your destination Amazon S3 bucket.
We recommend activating the DataSync agent with an interface endpoint. This step will keep traffic between the agent and the AWS DataSync service within a private network. To keep costs in check, consider deploying both the agent and the private endpoint within the same Availability Zone (AZ). This alignment can be achieved by ensuring that the AWS DataSync agent and the interface endpoint are both located in the same AZ.
When transferring data from Wasabi to AWS with DataSync, data is encrypted and validated, maintaining the confidentiality and integrity of your files. AWS DataSync takes care of data integrity checks during each transfer, verifying the accuracy and reliability of your data transfer operations. For more comprehensive information on how AWS DataSync handles file and object transfers, we recommend consulting the official documentation.
Figure 1: AWS DataSync Architecture for Wasabi Cloud Storage migration
Tutorial prerequisites
For this tutorial, you should have the following prerequisites:
- An AWS account with necessary permissions.
- Wasabi Cloud Storage account with bucket and objects to transfer.
Wasabi source overview
We will be using a Wasabi bucket in the Amsterdam, Netherlands (eu-central-1) region, which houses the objects to be transferred. By navigating to the following link, you can see the list of available regions in Wasabi. For this blog post, we will use the folder structure presented in Figure 2 – Testdata, Research, HR and Finance – each containing six TXT files that will be transferred to S3.
Figure 2: Wasabi Cloud Storage view of folders to be migrated to S3
Solution overview
- Create access keys in Wasabi.
- Create an Amazon S3 bucket as the destination.
- Configure an Amazon Virtual Private Cloud (Amazon VPC) for the Amazon VPC endpoint.
- Deploy a DataSync agent as an Amazon EC2 instance.
- Create a DataSync object storage location for Wasabi.
- Create an Amazon S3 Location.
- Create and configure the DataSync task settings.
- Start the task execution.
- Verify the data transferred.
Step 1: Create access keys in Wasabi
To generate access keys in Wasabi:
1. Navigate to the Access Keys section.
Figure 3: Navigate to the Access Keys section in Wasabi Cloud Storage
2. Select Create Access Keys.
3. Select Create.
4. Capture the Access Key and the Secret Key for later use.
Figure 4: Capture the Wasabi Cloud Storage access keys once generated
Review the Wasabi Cloud Storage and AWS DataSync documentation for specific access permission details.
Step 2: Create an Amazon S3 bucket as the destination
Create a bucket in Amazon S3, give it a name and copy the Amazon Resource Name (ARN) from the Properties tab.
Figure 5: Amazon S3 destination bucket
Step 3: Configure an Amazon Virtual Private Cloud (Amazon VPC) for the Amazon VPC endpoint
Set up the VPC, subnet, route table, and security group according to the necessary network requirements for utilizing VPC endpoints. Next, create a DataSync interface endpoint to ensure that the connection between the DataSync agent and the service. doesn’t traverse the public internet.
Step 4: Deploy the DataSync agent on Amazon EC2
After configuring the VPC endpoint, deploy a DataSync agent as an Amazon EC2 instance. Launch the Amazon EC2 instance using the latest DataSync Amazon Machine Image (AMI), in the subnet where the VPC endpoint was created and assign the security group for the agent. Finally, activate the agent with the VPC endpoint to associate it with your AWS account.
Step 5: Create a DataSync object storage location for Wasabi
1. Open the DataSync console and choose Locations, then select Create location.
2. For location type, choose Object Storage.
3. Select the agent deployed in step 4.
4. For Server, type “s3.<service URL>.wasabisys.com”.
-
- A list of Wasabi’s service URL’s can be found here.
5. For Bucket Name enter the name of the Wasabi bucket.
6. Under Authentication add the Wasabi access and secret key created in step 1.
7. Select Create Location.
Figure 6: Wasabi Cloud Storage Location
Step 6: Create Amazon S3 Location
1. Open the DataSync console and choose Locations, then select Create Location.
2. Select Amazon S3 as the Location type.
3. For S3 bucket, select the bucket created in Step 2.
4. For IAM Role, press Autogenerate.
5. Select Create location.
Figure 7: S3 location with IAM role
Step 7: Create and run an AWS DataSync task
1. Open the AWS DataSync console and choose Task, then select Create task.
2. For source location options, select Choose an existing location.
3. For Existing location select the Wasabi object storage location previously created and select Next.
Figure 8: Configure Source Location and choose existing location
4. For destination location options, select Choose an existing location. Next, select the S3 bucket previously created.
Figure 9: Configure destination location and choose an existing location
5. In the Configure settings tab, start by giving your task a name.
6. Incorporate the desired task execution configuration and data transfer configuration according to your specific requirements. However, please note that if you prefer, you may proceed with the default settings provided in the guide.
7. Enable Task Logging with an autogenerated CloudWatch log group as shown in Figure 9. Select Next, then Create task. Once completed, you can Start your task
Figure 10: Task logging and CloudWatch log group
Verify the data transferred
After the DataSync task execution successfully completes, compare the results to verify the data has been transferred correctly. AWS DataSync provides several built-in features that can be configured when creating the task, such as bandwidth throttling, data validation, and scheduling to customize the transfer process and verify that data is transferred accurately and efficiently.
If you encounter failures during the transfer process, you can navigate to the CloudWatch logs to investigate the root cause of the issue. The logs provide a detailed overview of the DataSync task, including errors or warnings that occurred during the transfer process. By examining the logs, you can troubleshoot the issue and make the necessary adjustments to make sure that future transfers run smoothly.
Cleaning up
To avoid incurring future charges, delete the resources used in this tutorial.
- Delete the API Key in Wasabi.
- Delete the Wasabi bucket contents and bucket.
- Delete DataSync task, locations, then agent.
- Shut down the EC2 instance.
- Delete the DataSync VPC endpoint.
- Delete Amazon S3 bucket.
Conclusion
In this blog post, we discussed scenarios where you your organization is looking to move data from other cloud providers into AWS. We discussed how AWS DataSync can help you with your data movement workflows by simplifying scheduling, data encryption, and data validation. Additionally, we covered VPC endpoint setup, DataSync agent creation, and specific steps to configure DataSync to migrate data from Wasabi Cloud Storage to Amazon S3.
AWS DataSync is a powerful service that offers a range of built-in features to facilitate data transfers between different storage systems, whether on-premises or in the cloud. Its network optimization, data validation, and scheduling capabilities enable users to perform transfers quickly, accurately, and efficiently, while its encryption and monitoring features maintain the security and integrity of the data being transferred. You can also accelerate your data migrations from Wasabi Cloud Storage by scaling the DataSync walk through in this post with multiple DataSync agents and tasks.
Overall, AWS DataSync is a valuable tool for organizations looking to transfer data between different storage systems or replicate data for backup or disaster recovery purposes. Its ease of use, flexibility, and range of features make it a popular choice for businesses of all sizes and industries. With AWS DataSync, users can be confident that their data transfers are secure, reliable, and optimized for efficiency. To learn more, review AWS DataSync documentation on transferring data with other cloud object storage.