AWS Storage Blog

Migrating DigitalOcean Spaces to Amazon S3 using AWS DataSync

Organizations may at times have to transfer large amounts of object data from one cloud service provider to another. There can be several reasons for this, including data consolidation, moving workloads, disaster recovery planning, or cost optimization efforts. Completing a successful migration requires several key elements, including full encryption of the data being transferred, the ability to detect changes, validating the integrity of the objects being moved, controlling network speeds to optimize migration performance, and cost-effective monitoring. However, developing solutions that meet these requirements can be a time-consuming and expensive process, and difficult to scale. Furthermore, migrating between public cloud providers may limit certain options, such as direct access to the storage devices at the data center.

In this post, I will guide you through moving object data from DigitalOcean Spaces, an Amazon S3-compatible object storage service, to Amazon S3 using AWS DataSync. AWS DataSync is a powerful service that offers a range of built-in features to facilitate data transfers between different storage systems, whether on-premises or in the cloud. Its network optimization, data validation, and scheduling capabilities enable users to perform transfers quickly, accurately, and efficiently. In moving data to AWS, you can leverage AWS’s unmatched experience, maturity, reliability, security, and performance, which you can depend upon for your most important applications.

AWS DataSync overview

AWS DataSync is a secure online service that automates and accelerates moving data between on premises and AWS Storage services. AWS DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSz for OpenZFS file systems, Amazon FSx for NetApp ONTAP file systems and other clouds.

How to transfer data from DigitalOcean Spaces to Amazon S3 with AWS DataSync

This tutorial provides a step-by-step guide to configure AWS DataSync for transferring objects from DigitalOcean Spaces to Amazon S3. The architecture of AWS DataSync for this post, as shown in the following diagram, involves a single AWS DataSync agent with the source objects in DigitalOcean Spaces and the Amazon S3 bucket serving as the destination. To prevent traffic from the agent to the AWS DataSync service from traversing over the public internet, it is highly recommended to deploy an interface endpoint. To minimize costs, the agent and private endpoint can be deployed in the same Availability Zone (AZ). This can be achieved by verifying that the AWS DataSync agent and interface endpoint are deployed in the same AZ.

Traffic from DigitalOcean Spaces to the agent travels encrypted over the public internet. AWS DataSync performs data-integrity checks during each transfer to verify the accuracy of the data. For more information on how AWS DataSync transfers files and objects, refer to the following documentation.

Figure 1: AWS DataSync Single agent architecture

Prerequisites

For this tutorial, you should have the following:

  • An AWS account
  • AWS Command Line Interface (CLI)
  • DigitalOcean Spaces with source objects to transfer

DigitalOcean Spaces source overview

In this walkthrough, I will be using a DigitalOcean Spaces deployed in the AMS3 region, which houses the objects to be transferred. You can refer to the list of all available DigitalOcean regions here. The folder structure includes three folders – HR, Finance, and Research – each containing six TXT files that will be transferred to Amazon S3.Figure 2: DigitalOcean Spaces Objects to be transferred

Solution overview

  1. Create Spaces key in DigitalOcean.
  2. Create an Amazon S3 bucket as the destination.
  3. Configure a network for the Amazon VPC endpoint.
  4. Deploy an Amazon EC2 DataSync agent.
  5. Create a DataSync location for DigitalOcean Spaces.
  6. Create Amazon S3 Location.
    • Create and run an AWS DataSync task.
    • Configure Settings and Start the Task.
  7. Verify the data transferred.

Step 1: Create Spaces key in DigitalOcean

To generate a key in DigitalOcean:
1. Select your project.
2. In the left-hand menu, navigate to API, then Space Keys.
3. Click on Generate New Key.
4. Give the key a name.
5. Copy both the DigitalOcean key and secret.

Note: The DigitalOcean key and secret are sensitive and should be treated as such.

Step 2: Create an Amazon S3 bucket as the destination

Create a bucket in Amazon S3, give it a name and copy the Amazon Resource Name (ARN) from the Properties tab.

Step 3: Configure a network for the Amazon VPC endpoint

Set up the VPC, subnet, route table, and security group according to the necessary network requirements for utilizing VPC endpoints, and then create a DataSync interface endpoint to minimize the need for public IP addresses and verify that the connection between the DataSync service and the agent doesn’t traverse the public internet.

Step 4: Deploy an Amazon EC2 DataSync agent

After setting up the VPC endpoint, the next step is to deploy an agent as an Amazon EC2 instance. Launch the Amazon EC2 instance using the latest DataSync Amazon Machine Image (AMI) in the subnet from the previous step and assign the security group for agents. Finally, activate the agent to associate it with your AWS account.

Step 5: Create a DataSync location for DigitalOcean Spaces

  1. Open the DataSync console and choose Locations, then create Create location.
  2. For location type, choose Object storage.
  3. Select the agent created in Step 4.
  4. For Server, type “<do-location>.digitaloceanspaces.com.” Note: In my case I needed to use ams3 as my “do-region” as my spaces were in Amsterdam. Have a look at your Origin endpoint in DigitalOcean Spaces to get your correct location.
  5. For Bucket Name put the name of the DigitalOcean Spaces bucket and autogenerate the IAM Role.
  6. Under authentication add the access and secret key created in Step 1.
  7. Select Create location.

Figure 5: DigitalOcean Spaces Location

Step 6: Create Amazon S3 location

  1. Open the DataSync console and choose Locations, then create Create location.
  2. Select Amazon S3 as the location type.
  3. For S3 bucket, select the bucket created in Step 2.
  4. For IAM role, select the autogenerated role configured in Step 5 in order to generate an IAM role to allow DataSync to access the S3 Bucket.
  5. Select Create location.

Create and run an AWS DataSync task

  1. Open the AWS DataSync console and choose Task, then select Create task.
  2. For source location options, select Choose an existing location.
  3. For Existing location select the DigitalOcean Spaces location previously created as shown in Figure 7.

Figure 7: Configure source location

Configure destination location

  1. For source location options, select Choose an existing location.
  2. For Existing location select the S3 bucket previously created.

Figure 8: Configure target location

Configure Settings and Start the Task

  1. Give your task a name. In my case I used: DOSpaces-S3.
  2. Incorporate the desired Task execution configuration and Data transfer configuration according to your specific requirements. However, please note that if you prefer, you may proceed with the default settings provided in the guide.
  3. Enable Task Logging with an autogenerated CloudWatch log group as shown in Figure 10.
  4. Select Create task, and wait for the Task status to be Available.

Once completed, you can Start your task.

Figure 9: Task logging image

Next, you will see Task Logging with an autogenerated CloudWatch log group.

Figure 10: Example configuration for cloudwatch

Step 7: Verify the data transferred

After the DataSync task execution successfully completes, you can compare the results to verify the data has been transferred correctly. AWS DataSync provides several built-in features that can be configured when creating the task, such as bandwidth throttling, data validation, and scheduling to customize the transfer process and verify that data is transferred accurately and efficiently.

If you encounter failures during the transfer process, you can navigate to the CloudWatch logs to investigate the root cause of the issue. The logs provide a detailed overview of the DataSync task, including errors or warnings that occurred during the transfer process. By examining the logs, you can troubleshoot the issue and make the necessary adjustments to make sure that future transfers run smoothly.

Cleaning up

To avoid incurring future charges, delete the resources used in this tutorial.

  1. Delete the API Key in DigitalOcean Spaces.
  2. Delete DataSync task, locations then agent.
  3. Shut down the EC2 instance.
  4. Delete VPC endpoint.
  5. Delete Amazon S3 bucket.

Conclusion

In this blog post, I covered using AWS DataSync to transfer data from DigitalOcean Spaces to Amazon S3 using AWS DataSync. I walked through detailed steps on how to transfer data, from setting up a VPC endpoint and creating a DataSync agent component to transferring data across cloud providers. Leveraging AWS DataSync, you can achieve a seamless migration process that maintains security of data being transferred.

Overall, AWS DataSync is a valuable tool for organizations looking to transfer data between different storage systems or replicate data for backup or disaster recovery purposes. Its ease of use, flexibility, and range of features make it a popular choice for businesses of all sizes and industries. With AWS DataSync, users can be confident that their data transfers are secure, reliable, and optimized for efficiency. To learn more, see AWS DataSync, our What’s New post announcing expanded support for copying data to and from other clouds,  a blog post about migrating DigitalOcean Spaces to Amazon S3 using AWS DataSync, and a video about migrating Azure Blob to and from AWS Storage.

Emil Richardsen Nedregård

Emil Richardsen Nedregård

Emil Richardsen Nedregård is a Solution Architect at AWS, focusing on networking, landing zones, and migrations to the cloud. He works with clients on their migration and cloud adoption cases, leveraging AWS to meet their business needs. Emil work closely with organizations in Norway to effectively adopt and utilize cloud services.