Using the Amazon S3 console to upload or transfer data to Amazon S3 works well for small amounts of data. Is there a way to transfer larger amounts of data more efficiently?

The Amazon S3 console provides a convenient interface for uploading and copying relatively small amounts of data to Amazon S3 buckets. If you need to work with large volumes of data, you can improve performance by using other methods.

Try these methods for transferring large amounts of data to or from S3 buckets.

If the S3 buckets are in the same region, you can use the AWS Command Line Interface (CLI) to simultaneously run multiple instances of the AWS S3 cp (copy), mv (move), or sync (synchronize) commands with the --exclude filter to increase performance through multithreading. As described in Use of Exclude and Include Filters, you can start multiple instances of the AWS CLI to simultaneously launch the cp, mv, or sync commands with mutually exclusive --exclude filters so that each instance performs operations on only those objects that are not explicitly excluded. For example, given a source S3 bucket named s3://srcbucket/ and a destination S3 bucket named s3://destbucket/ where the source contains a large number of files that begin with some lowercase letter, you could open two instances of the AWS CLI and run the following commands to perform a multithreaded copy of the files in s3://srcbucket/ to s3://destbucket/:

aws s3 cp s3://srcbucket/ s3://destbucket/

--recursive --exclude "a*" --exclude "b*" --exclude "c*" --exclude "d*" --exclude "e*"

--exclude "f*" --exclude "g*" --exclude "h*" --exclude "i*" --exclude "j*" --exclude "k*"

--exclude "l*" --exclude "m*" --exclude "n*"

aws s3 cp s3://srcbucket/ s3://destbucket/

--recursive --exclude "o*" --exclude "p*" --exclude "q*" --exclude "r*" --exclude "s*"

--exclude "t*" --exclude "u*" --exclude "v*" --exclude "w*" --exclude "x*"

--exclude "y*" --exclude "z*"

This is a very simple example that does not account for the possible existence of folders, uppercase characters, or numerals. The first instance of the command copies all files that do not begin with the lowercase letters a–n, and the second instance of the command copies all files that do not begin with the lowercase letters o–z. When using this method, double check your exclude parameters to avoid conflicts caused by both instances performing operations on the same files.

To take advantage of additional threads, restrict the scope of the --exclude filter for each instance of the AWS CLI that you run. The same technique can be used to move or synchronize files between source and destination buckets. The use of multiple threads can significantly increase data throughput. Additionally, these commands support a local directory as a source or destination, so you can copy, move, or synchronize files to or from your local computer to an Amazon S3 bucket.

One drawback to this approach is the possibility of slowdowns caused when making multiple requests to S3 for sequential key/file names. For more information about this potential pitfall, see Request Rate and Performance Considerations.

Consider using AWS Import/Export Snowball for S3 uploads or downloads that exceed 1 TB. Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS. Using Snowball addresses common challenges with large-scale data transfers, including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.

For more information, see AWS Import/Export Snowball.

When time is of primary importance, consider using S3DistCp with Amazon Elastic MapReduce (Amazon EMR). S3DistCp works in conjunction with an EMR cluster to quickly relocate objects from one S3 bucket to another. The EMR cluster accrues additional cost but provides excellent fault tolerance and speed. For more information about this option, see Distributed Copy Using S3DistCp.

Amazon S3, copy, move, sync, performance, optimize, exclude, multithreaded


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.

Published: 2016-01-07