AWS Developer Blog

Downloading Objects from Amazon S3 using the AWS SDK for Ruby

by Trevor Rowe | on | in Ruby | | Comments

The AWS SDK for Ruby provides a few methods for getting objects out of Amazon S3. This blog post focuses on using the v2 Ruby SDK (the aws-sdk-core gem) to download objects from Amazon S3.

Downloading Objects into Memory

For small objects, it can be useful to get an object and have it available in your Ruby processes. If you do not specify a :target for the download, the entire object is loaded into memory into a StringIO object.

s3 = Aws::S3::Client.new
resp = s3.get_object(bucket:'bucket-name', key:'object-key')

resp.body
#=> #<StringIO ...> 

resp.body.read
#=> '...'

Call #read or #string on the StringIO to get the body as a String object.

Downloading to a File or IO Object

When downloading large objects from Amazon S3, you typically want to stream the object directly to a file on disk. This avoids loading the entire object into memory. You can specify the :target for any AWS operation as an IO object.

File.open('filename', 'wb') do |file|
  reap = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: file)
end

The #get_object method still returns a response object, but the #body member of the response will be the file object given as the :target instead of a StringIO object.

You can specify the target as String or Pathname, and the Ruby SDK will create the file for you.

resp = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: '/path/to/file')

Using Blocks

You can also use a block for downloading objects. When you pass a block to #get_object, chunks of data are yielded as they are read off the socket.

File.open('filename', 'wb') do |file|
  s3.get_object(bucket: 'bucket-name', key:'object-key') do |chunk|
    file.write(chunk)
  end
end

Please note, when using blocks to downloading objects, the Ruby SDK will NOT retry failed requests after the first chunk of data has been yielded. Doing so could cause file corruption on the client end by starting over mid-stream. For this reason, I recommend using one of the preceding methods for specifying the target file path or IO object.

Retries

The Ruby SDK retries failed requests up to 3 times by default. You can override the default using :retry_limit. Setting this value to 0 disables all retries.

If the Ruby SDK encounters a network error after the download has started, it attempts to retry request. It first checks to see if the IO target responds to #truncate. If it does not, the SDK disables retries.

If you prefer to disable this default behavior, you can either use the block mode or set :retry_limit to 0 for your S3 client.

Range GETs

For very large objects, consider using the :range option and download the object in parts. Currently there are no helper methods for this in the Ruby SDK, but if you are interested in submitting something, we accept pull requests!

Happy downloading.

TAGS: ,