AWS Developer Tools Blog
Downloading Objects from Amazon S3 using the AWS SDK for Ruby
The AWS SDK for Ruby provides a few methods for getting objects out of Amazon S3. This blog post focuses on using the v2 Ruby SDK (the aws-sdk-core gem) to download objects from Amazon S3.
Downloading Objects into Memory
For small objects, it can be useful to get an object and have it available in your Ruby processes. If you do not specify a :target
for the download, the entire object is loaded into memory into a StringIO object.
s3 = Aws::S3::Client.new resp = s3.get_object(bucket:'bucket-name', key:'object-key') resp.body #=> #<StringIO ...> resp.body.read #=> '...'
Call #read
or #string
on the StringIO to get the body as a String object.
Downloading to a File or IO Object
When downloading large objects from Amazon S3, you typically want to stream the object directly to a file on disk. This avoids loading the entire object into memory. You can specify the :target
for any AWS operation as an IO object.
File.open('filename', 'wb') do |file| reap = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: file) end
The #get_object
method still returns a response object, but the #body
member of the response will be the file object given as the :target
instead of a StringIO object.
You can specify the target as String or Pathname, and the Ruby SDK will create the file for you.
resp = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: '/path/to/file')
Using Blocks
You can also use a block for downloading objects. When you pass a block to #get_object
, chunks of data are yielded as they are read off the socket.
File.open('filename', 'wb') do |file| s3.get_object(bucket: 'bucket-name', key:'object-key') do |chunk| file.write(chunk) end end
Please note, when using blocks to downloading objects, the Ruby SDK will NOT retry failed requests after the first chunk of data has been yielded. Doing so could cause file corruption on the client end by starting over mid-stream. For this reason, I recommend using one of the preceding methods for specifying the target file path or IO object.
Retries
The Ruby SDK retries failed requests up to 3 times by default. You can override the default using :retry_limit
. Setting this value to 0 disables all retries.
If the Ruby SDK encounters a network error after the download has started, it attempts to retry request. It first checks to see if the IO target responds to #truncate
. If it does not, the SDK disables retries.
If you prefer to disable this default behavior, you can either use the block mode or set :retry_limit
to 0 for your S3 client.
Range GETs
For very large objects, consider using the :range
option and download the object in parts. Currently there are no helper methods for this in the Ruby SDK, but if you are interested in submitting something, we accept pull requests!
Happy downloading.