AWS Storage Blog
Migrate data from Dropbox to Amazon S3 using Rclone
Whether you choose to operate entirely on AWS or in multicloud and hybrid environments, one of the primary reasons to adopt AWS is the broad choice of services we offer, enabling you to explore, build, deploy, and monitor your workloads.
Amazon S3 is a great option for Dropbox users seeking a comprehensive storage solution. Amazon S3 offers high data durability and availability, practically unlimited scalability, and performant cost-effective storage options. With its pay-as-you-go pricing model and storage classes optimized for various access patterns and cost requirements, Amazon S3 caters to diverse needs, from managing mission-critical data to storing data for backup and archive. To migrate date from Dropbox to Amazon S3, you can use rclone, an open-source command-line tool that provides a streamlined solution for transferring data.
In this post, we demonstrate how you can use rclone to move data from Dropbox to Amazon S3. We walk you through the process of setting up an Amazon Elastic Compute Cloud (EC2) instance with rclone installed and configured to transfer data from Dropbox to Amazon S3. The majority of this setup process is automated through AWS CloudFormation. We also explore different rclone flags to reduce time to transfer data while addressing service quotas. Rclone streamlines the migration process with its native support for both storage systems, thus enabling synchronization and efficient data transfer. Its customizable configuration options, such as controlling concurrent transfers and setting transaction rate limits, optimize the transfer process.
Solution overview
Figure 1 shows the architecture for transferring data from Dropbox to Amazon S3 using an EC2 instance with rclone installed. The EC2 instance, provisioned through CloudFormation, acts as an intermediary to facilitate the data transfer process. Rclone running on the EC2 instance transfers data from Dropbox to Amazon S3.
Figure 1: Architecture diagram to transfer data from Dropbox to Amazon S3 using rclone
By using CloudFormation, the deployment and configuration of the EC2 instance and rclone setup are automated. This makes sure of a consistent and reproducible environment while minimizing manual error, saving time, and reducing effort. Rclone, configured on the EC2 instance, connects to both Dropbox and Amazon S3, thus enabling efficient data transfer from Dropbox to Amazon S3.
Prerequisites
You must have the following prerequisites to implement this solution:
- Make sure you have necessary permission to access AWS Identity and Access Management (IAM), AWS Secrets Manager, and Amazon EC2.
- You should have Amazon Virtual Private Cloud (Amazon VPC), a subnet for your VPC, and an EC2 key pair. These are necessary as user input selection when you set up the CloudFormation stack.
- Decide between the x86 and AWS Graviton-based instances (ARM), and choose the EC2 instance type that’s offered in your subnet’s Availability Zone (AZ). Refer to the user guide for finding an EC2 instance type to filter by architecture and AZ. For best price-performance, we recommend ARM instance unless you have a specific requirement for an x86.
- Rclone suggests making your Dropbox App ID instead of using the default one. Refer to the rclone documentation for creating your Dropbox client ID and to learn why it’s recommended.
- You need an S3 bucket as a destination data store.
Review the CloudFormation template to understand IAM user permissions and adjust as necessary. Refer to IAM policies best practices for more details. Similarly, check and update security groups for the EC2 instance if needed.
The CloudFormation automates the rclone setup for Amazon S3, but Dropbox needs manual token authorization after connecting to the EC2 instance. We cover this later in the post.
Walkthrough
Transferring data from Dropbox to Amazon S3 in this post involves:
1. Deploying the CloudFormation template, which provisions an EC2 instance, installs rclone, and configures rclone remote connections.
2. Completing Dropbox token authorization after connecting to the EC2 instance.
3. Using rclone commands to transfer data from Dropbox to Amazon S3.
In the following sections we go through these steps in more detail.
1. CloudFormation stack deployment
This section walks you through deploying the CloudFormation template to create necessary resources for data transfer.
1.1. Create stack
1.1.1. Download the CloudFormation template, CFT-Dropbox-to-S3.yml, designed for this solution here and then visit the CloudFormation console.
1.1.2. On the Stacks page, choose Create stack in the top right, and then choose With new resources (standard).
1.1.3. On the Create stack page, choose Choose and existing template in the prepare template section. In the specify template section, choose Upload a template file > Choose file to choose a CloudFormation template, CFT-Dropbox-to-S3.yml, which you downloaded earlier.
1.1.4. Choose Next.
1.2. Specify stack details
Figure 2. CloudFormation stack configuration
1.2.1. Provide a unique Stack name, as shown in the preceding figure. For example, “Dropbox-to-S3.”
1.2.2. Select the VPC and Subnet in which to create your EC2 instance.
1.2.3. Enter your preferred Instance Type. Follow the instructions given in the “Prerequisites” section for details on selecting the compatible instance type.
1.2.4. Select the EC2 key pair.
1.2.5. Enter into the Your IP Address range from where you want to allow inbound traffic to your instance.
1.2.6. Enter the Client ID and Client secret that you created from the Dropbox API Console in the previous steps.
1.3. Configure stack options
1.3.1. Leave the options as default and choose Next.
1.4. Review and create
1.4.1. Select the check mark at the bottom of the Capabilities section. Then choose to agree to the acknowledgement of creating IAM resources and select Submit.
2. Dropbox authorization token
This section guides you through the process of authorizing rclone to access your Dropbox account.
2.1. Connecting to Dropbox Remote
After completing the CloudFormation deployment, connect to your EC2 instance “Rclone Instance – Dropbox to S3.”
The first step after connecting to the EC2 instance is to authorize rclone for Dropbox access. This is another security step along with providing Dropbox client_id
and client_secret
for rclone access.
Enter the following command and enter n
because you are working on a remote or headless machine. We are selecting this option because we don’t have a browser in our Ubuntu-based EC2 instance.
rclone config reconnect dropbox-remote:
Figure 3: Configuring rclone to connect to Dropbox remote
After entering n
, rclone gives us a command to get the token from Dropbox for rclone. This command needs to be run on a machine that has access to a browser. Follow the documentation on configuring rclone on a remote or headless machine to authorize Dropbox through your machine with a browser installed. Make sure you are logged in to the Dropbox account that you want to access.
Figure 4: Accessing the link given by rclone to authenticate with Dropbox
2.2. Authorization with Dropbox
After following a series of prompts from Dropbox (Choose an account > rclone wants to access your Dropbox Account > Authorization code), you should see a screen that looks like Figure 5, which contains the authorization code.
Figure 5: Authorization code from Dropbox
Copy the authorization code, go back to the rclone EC2 instance, and paste it in the Enter verification code>
field.
Figure 6: Providing the authorization code to rclone from Dropbox
2.3. Rclone remote connections configuration
When provisioning the EC2 instance with the CloudFormation template, an initial rclone config file was created for the Amazon S3 and Dropbox configurations. Entering the following code shows this initial configuration file, as shown in Figure 7.
nano /home/ubuntu/.config/rclone/rclone.conf
Figure 7: Rclone configuration file
If you’d like to make or update configurations according to your requirements, then you can directly edit this configuration file by going to the file path. Alternatively, you can enter an interactive configuration session after connecting to your instance by entering the rclone config
command and then edit or update from there, as shown in Figure 8.
Figure 8: Editing rclone configuration
Refer to Amazon S3 rclone Configuration and Dropbox rclone Configuration for detailed instructions on customizing remotes to meet your needs.
3. Transferring and managing files between Dropbox and Amazon S3
In this section, we use rclone to transfer data from Dropbox to Amazon S3 and perform various file management operations.
For this post, Dropbox and Amazon S3 remote connections are named as dropbox-remote and s3-remote. You can enter the following command to list the objects in the specified path.
rclone ls <remote>:<folder_name>/<subfolder_name>
rclone ls <remote>:<bucket_name>/<folder_name>
Figure 9: Listing objects within specified paths from remotes using rclone list command
Start by copying files from one remote to another. The rclone copy command copies files from the source to the destination. For test purposes, consider transferring approximately 1 TB of data from Dropbox to an S3 bucket.
The direct command to copy would be as follows:
rclone copy <source>:<sourcepath> <dest>:<destpath>
However, you need certain conditions in place to address the quotas from the storage service providers while also optimizing transfer speed. Rclone Flags add extra functionality to rclone commands, thus enabling you to manage data across remotes more efficiently. By using the appropriate flags, you can have a balance between adhering to the quotas and minimizing the time needed for data transfers.
You can start with code, understand how these flags work, and finally implement them to transfer data.
rclone copy \ --tpslimit 200 \ --transfers 200 \ --buffer-size 200M \ --checkers 400 \ --s3-upload-cutoff 100M \ --s3-chunk-size 100M \ --s3-upload-concurrency 50 \ dropbox-remote: \ s3-remote:EXAMPLE-DESTINATION-BUCKET \ -P
The -P
/--progress
flag helps you view real-time transfer statistics during file operations. The following flags help with different things:
--tpslimit
helps us to specify or limit the number of transactions per second (TPS). A transaction or a query in this case can be a PUT/GET/ POST if it’s an HTTP backend.
–transfers flag controls the number of simultaneous file transfers. By default, rclone performs four parallel transfers.
--buffer-size=SIZE
specifies the buffer size for each transfer to improve transfer speed. Each --transfer
uses the specified amount of memory for buffering.
--checkers=N
flag controls the number of parallel file checkers during operations, such as copying files. Checkers verify file integrity and make sure of correct transfers.
Rclone supports Amazon S3 multipart upload. Multipart upload enables the uploading of a single object as multiple parts, each representing a continuous portion of the object’s data. It is recommended to use multipart uploads instead of single operations when the object size exceeds 100 MB. Refer to the multipart uploads with rclone documentation for more information.
Dropbox quotas
As Dropbox API is a shared service with multiple users, Dropbox sets certain quotas, such as the number of transactions/ queries per a defined time, which depend on the Dropbox tier that you have.
Before setting your flags, look into the features available with each Dropbox plan.
Amazon S3 quotas
On the other hand, Amazon S3 allows significantly higher requests. It allows at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix, with no limit on the number of prefixes per bucket.
During scaling, temporary 503 errors (slow down) may occur but do subside when scaling is complete. Refer to the best practice design patterns to optimize Amazon S3 performance.
To make sure that you stay within the quota limits, consider values from both remotes and set the –tpslimit
to the lower of the two.
Figure 10.1: Rclone copy command with the flags to copy the data from Dropbox to the S3 bucket
Now you can implement the preceding commands. The following image shows that transferring 1001.06 GB of data from Dropbox to Amazon S3 took approximately 7 minutes 33.8 seconds.
Figure 10.2: Output of the copy command showing the details about the transfer
Refer to the rclone commands documentation for a complete list of available commands.
Monitoring EC2 instance
Refer to the rclone documentation to know more about these flags and when to adjust values. The different flags discussed in this post primarily use the network bandwidth and memory of the EC2 instance. We recommend experimenting with different flag values while monitoring your instance’s performance to achieve optimal results without exceeding the throttle limits of your EC2 instance.
For example, if you enter the preceding command to test transferring 1 TB of data, and then you noticed that your Amazon EC2 usage of CPU usage is 30%, memory usage is 60%, and network usage is 70%. Ideally you would want to stay under 100, for example a safer value of around 90%. Therefore, these usage metrics indicate that there is room for growth to increase the value for flags, such as –transfers
to enhance the transfer process.
You can monitor your EC2 instance by using Amazon CloudWatch, an application performance monitoring service.
Other methods to optimize the data transfer process
The following methods can help you further improve data transfer process:
1. Higher performance compute: A direct option is to select an EC2 instance type with higher performance. Try to have a balance among these factors: Amazon EC2 usage time, instance cost, and data transfer completion time.
2. Other relevant rclone flags: You can look into following other flags and evaluate if they are relevant to your preferences. Note that there are a few of these flags that at times may cross the quota values such as TPS, so consider them with caution.
Cleaning up
You may want to delete the resources created in this post to avoid unwanted future changes. To delete the stack resources, you can delete the CloudFormation stack. In addition, visit the Dropbox application settings and delete the application that you created earlier.
Conclusion
In this post, we explored how to efficiently transfer large amounts of data from Dropbox to Amazon S3 using rclone. You automated the setup process of EC2 instance creation, rclone installation, and remote connections configuration through AWS CloudFormation, reducing manual effort and potential errors. You explored various rclone flags that optimize transfer times while staying within service quotas, helping you avoid throttling and delays. This approach allows you to customize the data transfer process to match your specific requirements, thus making sure of efficient and reliable migrations even for large datasets.
By moving your data to Amazon S3, you can use cost savings, and its scalability and performance make it a great choice for your data lakes, analytical, and artificial intelligence/machine learning (AI/ML) applications. Thank you for reading this post. Leave any comments or ideas in the Comments section.
Feel free to check out other posts that might be helpful in migrating data and monitoring: