AWS Developer Tools Blog
Parallelizing Large Downloads for Optimal Speed
TransferManager now supports a feature that parallelizes large downloads from Amazon S3. You do not need to change your code to use this feature. You only need to upgrade the AWS SDK for Java to version 1.11.0 or later. When you download a file using TransferManager, the utility automatically determines if the object is multipart. If so, TransferManager downloads the object in parallel.
I have seen around 23% improvement in download time for multipart objects larger than 300 MB. My tests were run on a MacBook Pro with following specifications:
- Processor: 3.1 GHz Intel i7
- Memory: 16 GB 1867 MHz DDR3
- HardDrive: SSD-512G
- Logical Cores: 4
The performance varies based on the hardware and internet speed. By default, the TransferManager creates a pool of 10 threads, but you can set a custom pool size. For optimal performance, tune the executor pool size according to the hardware in which your application is running.
// Initialize TransferManager.
TransferManager tx = new TransferManager();
// Download the Amazon S3 object to a file.
Download myDownload = tx.download(myBucket, myKey, new File("myFile")));
// Blocking call to wait until the download finishes.
download.waitForCompletion();
// If transfer manager will not be used anymore, shut it down.
tx.shutdownNow();
The pause and resume functionality is supported for parallel downloads. When you pause the download, TransferManager tries to capture the information required to resume the transfer after the pause. You can use that information to resume the download from where you paused. To protect your download from a JVM crash, PersistableDownload should be serialized to disk as soon as possible. You can do this by passing an instance of S3SyncProgressListener to TransferManager#download. For more information about pause and resume, see this post.
Parallel downloads are not supported in some cases. The file is downloaded in serial if the client is an instance of AmazonS3EncryptionClient, if the download request is a ranged request or if the object was originally uploaded to Amazon S3 as a single part.
Low-Level Implementation:
An object can be uploaded to S3 in multiple parts. You can retrieve a part of an object from S3 by specifying the part number in GetObjectRequest. TransferManager uses this logic to download all parts of an object asynchronously and writes them to individual, temporary files. The temporary files are then merged into the destination file provided by the user. For more information, see the implementation here.
We hope you’ll try the new parallel download feature supported by TransferManger. Feel free to leave your feedback in the comments.