How can I improve the transfer speeds for copying data between my S3 bucket and EC2 instance?
Last updated: 2022-11-08
I want to transfer data from my Amazon Elastic Compute Cloud (Amazon EC2) instance to my Amazon Simple Storage Service (Amazon S3) bucket. How can I improve the transfer speeds?
The transfer speeds for copying, moving, or syncing data from Amazon EC2 to Amazon S3 depend on several factors. To improve the transfer speeds when you copy, move, or sync data between an EC2 instance and an S3 bucket, use the following methods:
- Use enhanced networking on the EC2 instance.
- Use parallel workloads for the data transfer.
- Customize the upload configurations on the AWS Command Line Interface (AWS CLI).
- Use an Amazon Virtual Private Cloud (Amazon VPC) endpoint for Amazon S3.
- Use S3 Transfer Acceleration between geographically distant AWS Regions.
- Upgrade your EC2 instance type.
- Use chunked transfers.
Use enhanced networking on the EC2 instance
Enhanced networking provides higher bandwidth, higher packet per second (PPS) performance, and lower interinstance latencies. You can turn on enhanced networking at no additional charge.
If your EC2 instance's PPS rate seems to have reached its ceiling, the instance has likely reached the upper thresholds of the network interface driver. When this happens, consider turning on enhanced networking.
Note: Be sure to review the instance requirements for enhanced networking.
Use parallel workloads for the data transfer
To potentially improve the overall time that it takes to complete the data transfer, consider splitting the transfer into multiple mutually exclusive operations. For example, if you're using the AWS CLI, you can run concurrent instances of AWS S3 cp, AWS S3 mv, or AWS S3 sync. If you have data spread across multiple prefixes, you can run multiple instances of the AWS CLI to perform separate sync operations in parallel.
For example, you can run parallel sync operations for the following different prefixes:
- aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder1 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder1
- aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder2 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder2
Note: If you receive errors when running AWS CLI commands, make sure you confirm that you’re running a recent version of the AWS CLI.
Customize the upload configurations on the AWS CLI
To speed up the data transfer, customize the following AWS CLI configuration values for Amazon S3:
- multipart_chunksize: This value sets the size of each part that AWS CLI uploads in a multipart upload for an individual file. This setting allows you to break down a larger file (for example, 300 MB) into smaller parts for quicker upload speeds.
Note: A multipart upload requires that a single file is uploaded in not more than 10,000 distinct parts. You must make sure that the chunksize that you set balances the part file size and the number of parts.
- max_concurrent_requests: By default, the AWS CLI supports multithreading. You can change the max_concurrent_requests value to increase the number of requests that can be sent to Amazon S3 at a time. The default value is 10. After you increase this value, you might get a stagnant response. However, when you combine a higher max_concurrent_requests value with parallel workloads, you can achieve better transfer speeds overall.
Note: More resources are consumed on your machine when you run more threads. Make sure that your machine has enough resources to support the maximum number of concurrent requests.
Use a VPC endpoint for Amazon S3
If your EC2 instance is in the same Region as the S3 bucket, then consider using a VPC endpoint for Amazon S3. VPC endpoints can help improve overall performance and reduce the load on your network address translation (NAT).
Another benefit to using a VPC endpoint is that you can privately connect to a VPC without an internet gateway, NAT device, or VPN connection. Instances in a VPC don't require public IP addresses to communicate with resources like an Amazon S3 bucket. When you use a VPC endpoint, the data traffic between the VPC and Amazon S3 is routed on the AWS network.
Use S3 Transfer Acceleration between geographically distant AWS Regions
The data transfer speed can be higher if the EC2 instance and the S3 bucket are geographically closer to each other. If the instance and the bucket are in geographically distant AWS Regions, consider turning on Amazon S3 Transfer Acceleration. Transfer Acceleration provides fast and secure transfers over long distances using Amazon CloudFront's globally distributed edge locations.
Transfer Acceleration incurs additional charges, so be sure to review Amazon S3 pricing. To determine if Transfer Acceleration will improve the transfer speeds for your use case, review the Amazon S3 Transfer Acceleration speed comparison tool.
Upgrade your EC2 instance type
High EC2 instance CPU utilization can be a factor to your overall slow transfer speeds. You can upgrade your instance to another instance type that provides higher memory and network performance. Larger instance sizes for an instance type typically provide better network performance than smaller instance sizes of the same type.
Note: For a reliable network connection between the EC2 instances and Amazon S3, choose an instance type with at least 10 gigabytes per second network connectivity.
Use chunked transfers
If you're transferring large files, multipart uploads and ranged GETs can help improve overall transfer performance.