Category: Ruby


Using Resources

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

With the recent 2.0 stable release of the aws-sdk-core gem, we started publishing preview releases of aws-sdk-resources. Until the preview status is released, you will need to use the –pre flag to install this gem:

gem install aws-sdk-resources --pre

In bundler, you should give the full version:

# update the version as needed
gem 'aws-sdk-resources', version: '2.0.1.pre'

Usage

Each service module has a Client class that provides a 1-to-1 mapping of the service API. Each service module now also has a Resource class that provides an object-oriented interface to work with.

Each resource object wraps a service client.

s3 = Aws::S3::Resource.new
s3.client
#=> #<Aws::S3::Client>

Given a service resource object you can start exploring related resources. Lets start with buckets in Amazon S3:

# enumerate all of my buckets
s3.buckets.map(&:name)
#=> ['aws-sdk', ...]

# get one bucket
bucket = s3.buckets.first
#=> #<Aws::S3::Bucket name="aws-sdk">

If you know the name of a bucket, you can construct a bucket resource without making an API request.

bucket = s3.bucket('aws-sdk')

# constructors are also available
bucket = Aws::S3::Bucket.new('aws-sdk')
bucket = Aws::S3::Bucket.new(name: 'aws-sdk')

In each of the three previous examples, an instance of Aws::S3::Bucket is returned. This is a lightweight reference to an actual bucket that might exist in Amazon S3. When you reference a resource, no API calls are made until you operate on the resource.

Here I will use the bucket reference to delete the bucket.

bucket.delete

You can use a resource to reference other resources. In the next exmple, I use the bucket object to reference an object in the bucket by its key.
Again, no API calls are made until I invoke an operation such as #put or #delete.

obj = bucket.object('hello.txt')
obj.put(body:'Hello World!')
obj.delete

Resource Data

Resources have one or more identifiers, and data. To construct a resource, you only need the identifiers. A resource can load itself using its identifiers.

Constructing a resource object from its identifiers will never make an API call.

obj = s3.bucket('aws-sdk').object('key') # no API call made

# calling #data loads an object, returning a structure
obj.data.etag
#=> "ed076287532e86365e841e92bfc50d8c"

# same as obj.data.etag
obj.etag
#=> "ed076287532e86365e841e92bfc50d8c"

Resources will never update internal data until you call #reload. Use #reload if you need to poll a resource attribute for a change.

# force the resource to refresh data, returning self
obj.reload.last_updated_at

Resource Associations

Most resources types are associated with one or more different resources. For example, an Aws::S3::Bucket object bucket has many objects, a website configuration, an ACL, etc.

Each association is documented on the resource class. The API documentation will specify what API call is being made. If the association is plural, it will document when multiple calls are made.

When working with plural associations, such as bucket that has many objects, resources are automatically paginated. This makes it simple to lazily enumerate all objects.

bucket = s3.bucket('aws-sdk')

# enumerate **all** objects in a bucket, objects are fetched
# in batches of 1K until every object has been yielded
bucket.objects.each do |obj|
  puts "#{obj.key} => #{obj.etag}"
end

# filter objects with a prefix
bucket.objects(prefix:'/tmp/').map(&:key)

Some APIs support operating on resources in batches. When possible,
the SDK will provide batch actions.

# gets and deletes objects in batches of 1K, sweet!
bucket.objects(prefix:'/tmp/').delete

Resource Waiters

Some resources have associated waiters. These allow you to poll until the resource enters a desired state.

instance = Aws::EC2::Instance.new('i-12345678')
instance.stop
instance.wait_until_stopped
puts instance.id + ' is stopped'

Whats Next?

The resource interface has a lot of unfinished features. Some of the things we are working on include:

  • Adding #exists? methods to all resource objects
  • Consistent tagging interfaces
  • Batch waiters
  • More service coverage with resource definitions

We would love to hear your feedback. Resources are available now in the preview release of the aws-sdk-resources gem and in the master branch of GitHub.

Happy coding!

Caching the Rails Asset Pipeline with Amazon CloudFront

by Alex Wood | on | in Ruby | Permalink | Comments |  Share

Amazon CloudFront is a content delivery web service. It integrates with other Amazon Web Services to give developers and businesses an easy way to distribute content to end users with low latency, high data transfer speeds, and no minimum usage commitments.

Ruby on Rails introduced the asset pipeline in version 3.1. The Rails asset pipeline provides a framework to concatenate and minify or compress JavaScript and CSS assets. It also adds the ability to write these assets in other languages and pre-processors such as CoffeeScript, Sass and ERB.

With CloudFront’s support for custom origin servers, and features of the Rails asset pipeline, building a CDN for your static assets is simple. In this blog post, we will show you how to set this up for your environment.

Do You Have Geographically Diverse Users?

Amazon CloudFront provides 52 (as of when this was written*) edge locations around the world. Your static content can be cached by these edge locations to reduce the latency of your web application. Additionally, this can reduce the load on your app servers, as it limits the number of times your app server needs to serve large static files.

* See the current list of edge locations here.

Prerequisites

You should be able to deploy your Ruby on Rails application to the Internet, and you should know the hostname or IP address for where your application is hosted. If you have followed along with the series and deployed our sample application on AWS OpsWorks, you can complete this tutorial. If not, consider trying out a deployment first.

Creating a CloudFront Distribution

First, we will create a new CloudFront distribution that uses our app as the custom origin. From the Amazon CloudFront console, click Create Distribution. Under “Web”, click Get Started.

Within this form, call the Origin ID “Rails App Server”, and for the Origin Domain Name, we will point to the URL of our Rails application. Here is how:

  • If you have a domain name (e.g., “www.example.com”), then use that.
  • If not, you should use as stable of a hostname as possible. For example, the hostname of your ELB instance, or at least an Elastic IP. For demonstration purposes, the public host name of your app server instance will also work.

If you’re using something other than a domain name, don’t worry, you can change the origin address later if you need to. All other options can be left at their default values, though you can turn on logging if you want. We aren’t going to talk about using your own domain for the CDN just yet. Once you have your origin options set, click Create Distribution.

Configuring the Ruby on Rails App to Use CloudFront

Using CloudFront as the asset Host for your static assets is truly a one line change.

In config/environments/production.rb:

config.action_controller.asset_host = ENV['CLOUDFRONT_ENDPOINT']

This tells Rails to use your CloudFront endpoint as the hostname for static assets. Your endpoint hostname will be specified in a host environment variable.

To pick up that change if you’re following along at home, go in to the OpsWorks console and edit your app:

  • Under “Application Source”, point to the cloudfront branch.
  • Add a new environment variable pair:
    • Key: CLOUDFRONT_ENDPOINT
    • Value: The URL of your CloudFront endpoint, available in the CloudFront console. For e.g., “lettersandnumbers.cloudfront.net”
    • You do not need to “Protect” this value.

Now, deploy your app! You do not need to run a database migration.

How It Works

While we wait for the deployment to complete, how does all of this work?

If you look at the page source of our application before adding the CloudFront CDN, you’ll see lines like this:

<link data-turbolinks-track="true" href="/assets/application-0f3bf7fe135e88baa2cb9deb7a660251.css" media="all" rel="stylesheet" />
<script data-turbolinks-track="true" src="/assets/application-2ab5007aba477451ae5c38028892fd78.js"></script>

Those lines are how the page is including your application.css and application.js files. In app/views/layouts/application.html.erb, they correspond to these lines:

<%= stylesheet_link_tag 'application', media: 'all', 'data-turbolinks-track' => true %>
<%= javascript_include_tag 'application', 'data-turbolinks-track' => true %>

In turn, these include statements source from app/assets/stylesheets/application.css.scss and app/assets/javascripts/application.js. If you run the command rake assets:precompile, these files will be compiled and a fingerprint will be added to the filename. For example, I ran rake assets:precompile and the following files were generated:

  • public/assets/application-f3fd37796ac920546df412f68b0d9820.js
  • public/assets/application-68a6279b040bd09341327b6c951d74bc.css

The fingerprinting is a big part of what makes all of this work so smoothly. Let’s take a look at the page source after our latest deployment:

<link data-turbolinks-track="true" href="http://lettersandnumbers.cloudfront.net/assets/application-bfe54945dee8eb9f51b20d52b93aa177.css" media="all" rel="stylesheet" />
<script data-turbolinks-track="true" src="http://lettersandnumbers.cloudfront.net/assets/application-4984ddfbabfbae63ef17d0c8dca28d6c.js"></script>

You can see that we are now sourcing our static assets from CloudFront, and that nothing broke in the process. You can also see the compiled assets with fingerprints added to the filenames. When we loaded the page, the stylesheet_link_tag and javascript_include_tag used our asset host as the host, adding the expected asset filenames to the end of the hostname. When CloudFront received the request, these assets did not exist in the cache, so it forwarded the request to the Rails server, which served the files to CloudFront, which cached the files and sent them to you, the requestor. Future requests would simply hit the CDN, see the file present, and serve it to you from the fastest edge node.

Because fingerprinting is included out of the box, we do not need to deal with cache invalidations. When the assets change, the fingerprint will change. When that happens, CloudFront will not have the new file, and it will make a request to the origin server to get it. Eventually, the old, unused files will expire. It just works.

Wrap-Up

In this post, we took a Ruby on Rails application and cached its static assets using Amazon CloudFront and the Ruby on Rails asset pipeline. We also discussed the broad strokes of how CloudFront and Rails work together to make this simple to do.

Have any questions, comments, or problems getting your application to cache static content with Amazon CloudFront? Suggestions for topics you would like to see next? Please let us know in the comments!

Deploying Ruby on Rails Applications to AWS OpsWorks

by Alex Wood | on | in Ruby | Permalink | Comments |  Share

To begin our series on using Ruby on Rails with Amazon Web Services, we are going to start at the beginning: deploying our application. Today, we will be deploying our application to AWS OpsWorks.

Following along with this post, you should be able to deploy our "Todo Sample App" to AWS using OpsWorks, with your application and database running on different machine instances.

Getting Your Application Ready to Deploy

You can deploy the Todo sample application to OpsWorks directly from its public GitHub repo, using the ‘opsworks’ branch. If you explore the repo, you will notice that we’ve made a few design choices:

  • Our secrets file at config/secrets.yml expects the RAILS_SECRET_TOKEN environment variable to be set on our application servers.
  • We have required the mysql2 gem, to interface with a MySQL database.
  • We have required the unicorn gem, and will use unicorn as our app server.

Creating an OpsWorks Stack

Log in to the AWS Console and navigate to the AWS OpsWorks Console. Click Add Stack and fill out the form like so:

Add Stack Screen

Don’t worry about the "Advanced" settings for now – we won’t need them during this part of the tutorial. Once you’ve filled out the form, just press Create Stack and you’re done.

Creating the Rails App Server Layer

After creating a stack, you’ll find yourself at a page prompting you to create a layer, an instance, and an app. To start, click Add a layer.

We are making a few changes to the default options here. They are:

  • Using Ruby version 2.1.
  • Using "nginx and Unicorn" instead of "Apache2 and Passenger".
  • Using RubyGems version 2.2.1.

Once you’re all done, click Add Layer. You’ll be redirected to the "Layers" screen.

Creating the Database Layer

Next, we’re going to create our database layer. On the layers screen, click + Layer.

Add MySQL Layer

Choose "MySQL" from the drop down box, and leave everything else as-is. Of course, if you’re taking your own screenshots, it is best to avoid sharing your passwords of choice as well!

Click Add Layer and you’re done with this step.

MySQL Layer vs. Amazon RDS Layer

When creating your stack, you can choose to use an OpsWorks-managed EC2 instance running MySQL, called a "MySQL" layer, or you can create a layer that points to an existing Amazon RDS instance.

For this example, we are going to use a MySQL layer. You could substitute an RDS layer if you so chose. In future posts, we may explore this option in depth.

Adding Instances

We’ve made layers for our application servers and database, but we do not yet have application servers or a database. We will next create an instance of each.

Create an App Server Instance

From the "Layers" screen, click Add instance in the "Rails App Server" layer.

Add Rails Instance

We’re creating a t2.micro instance to optimize for cost (this is a demo after all). You may also want to create an SSH key and specify it here in order to be able to log in to your host for debugging purposes, but we don’t strictly need it so we are going to skip that for now.

Click Add Instance once you’re done, then start to begin the instance setup process. While that runs, we are going to make our database instance.

One quick aside about using a t2.micro instance: you can only create them in a VPC. We have created a VPC in this example, but if you were creating a stack without a VPC, t2.micro instances will not be available to you. Other instance types will, of course, work for this example.

Create a Database Instance

You’ll note that, if you’re following along with each step, you’re now at the "Instances" page. From either here or the layers page, under "MySQL", click Add an instance.

Add MySQL Instance

As before, we are creating a t2.micro instance. Click Add Instance to create the instance, and start to begin instance setup.

Adding the Application

While our instances are set up, let’s add our application. Click the Apps link on the sidebar, then click Add an app.

Add App

For this example, we’re using the Git repository at https://github.com/awslabs/todo-sample-app.git as our Application Source, and using the opsworks branch to ensure that you’re deploying the same code I was as this post was written. You can name the app whatever you’d like, but the "TodoApp" name will match with fields we will fill out later, so if you do change the name, make sure to use that new name going forward wherever we use "TodoApp".

Add App

To generate a value for the RAILS_SECRET_KEY environment variable, you can use the command rake secret within your copy of the repo. Just remember to set this as a "Protected value", and if you’re taking screenshots of your process, this is a good time to remove the placeholder value you used for the screenshot and to add a new value generated with rake secret.

Click Add App when you are done.

Deploying the Application

It is likely that your instances are done being created and set up by now, but double check that they are both online before continuing to this step. If by chance they are not quite set up, by the time you prepare a cup of tea and come back, they should be ready.

Click the Deployments link on the sidebar, then click the Deploy an App button.

Deploy App

Since we have not done so yet, remember to check "Yes" for the "Migrate database" setting. We will also need this custom JSON to ensure the "mysql2" adapter is used as intended:

{
  "deploy":
  {
    "todoapp":
    {
      "database":
      {
        "adapter": "mysql2"
      }
    }
  }
}

Click Deploy, and grab another cup of tea. You’ve now deployed the Ruby on Rails "Todo" sample app to AWS OpsWorks!

Use Custom JSON for All Deployments

You probably don’t want to be filling in the custom JSON for your adapter choice with every deployment. Fortunately, you can move this custom JSON into your stack settings to have it go with every deployment.

Click the Stack link on the sidebar, open Stack Settings, and click Edit.

Stack Settings

Add the custom JSON you used for your deployment earlier, and click Save.

Try It Out

To view the app in action, click on your app server’s name on the deployment screen to go to the server’s info page. Click the link next to "Public DNS", and you should see the front page of the application:

You can add tasks, mark them complete, and delete them as you like. In short, your application is running and performing database transactions.

Hello TodoApp!

Wrap-Up

In this post, we started with a Ruby on Rails application, and went step-by-step through the process to get it up and running on AWS with OpsWorks. Now, you can follow this same process to get your own Rails application running on AWS.

Now that we can deploy our application, we will begin to explore ways to make our app scale, improve availability, and optimize some common speed bottlenecks.

Have any questions, comments, or problems getting the app up and running? Suggestions for topics you would like to see next? Please reach out to us in the comments!

Blog Series: Ruby on Rails on Amazon Web Services

by Alex Wood | on | in Ruby | Permalink | Comments |  Share

Welcome to a series on how to integrate Ruby on Rails apps with Amazon Web Services. In this series, we’re going to start from scratch with a simple app, and show you how to make it scalable, highly available, and fault tolerant.

The Sample App

For this blog series, we have built a sample app in Ruby on Rails. You can find it on GitHub as awslabs/todo-sample-app.

The app itself is designed to be simple to follow. It is a very basic todo list, where you can add tasks, mark them complete, or delete them, all on a single page. In this way, we can focus on the code changes needed to integrate the app with Amazon Web Services, without worrying about confusion over what the app itself does.

There’s no hand-waving here: all the code you need to do this is in this repo, and all the setup you need to do in AWS is in the posts.

What the Series Will Cover

We’re going to start by covering how to deploy the TodoApp to the cloud using AWS OpsWorks. Then, we will talk about speeding up your app by caching your static assets with Amazon CloudFront. We will go on to discuss other scaling and performance improvements you can make to solve real-world problems in the cloud.

Have a topic you’d love for us to cover? Let us know in the comments!

Up Next

The first post, showing you how to deploy the Todo Sample App to AWS OpsWorks will be out soon. Stay tuned!

AWS SDK for Ruby V2 Preview Release

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

Version 2 of the AWS SDK for Ruby is available now as a preview release. If you use Bundler with some standard best-practices, you should be unaffected by the v2 release of the aws-sdk gem. This blog post highlights a few things you might want to be aware of.

Installing V2 Preview Release

V2 of the AWS SDK for Ruby is available now as a preview release. To install v2, use –pre:

$ gem install aws-sdk --pre

If you are using bundler, you must specify the full version until the preview status is removed.

gem 'aws-sdk', '2.0.0.pre'

Lock your Dependencies

The V2 Ruby SDK is not backwards compatible with the V1 Ruby SDK. If you have a bundler dependency on aws-sdk and you do not specify a version, you will run into problems with the 2.0 final is released.
To ensure you are unaffected by the major version bump, ensure you specify a version dependency in your Gemfile:

gem 'aws-sdk', '< 2.0'

Alternatively, you can change your gem dependency from aws-sdk to aws-sdk-v1

gem 'aws-sdk-v1'

The AWS SDK for Ruby follows semver. This allows users to update within the same major version with confidence that there are not backwards incompatible changes. If there are, they will be treated as bugs.

Use Both Version in One Application

The V1 and V2 Ruby SDKs use different namespaces. You may only load one version of a single gem. We publish the v1 Ruby SDK as a separate gem now to allow users to load both versions. Additionally, the v2 SDK uses a different root namespace to avoid conflicts.

# in your Gemfile
gem 'aws-sdk-v1'
gem 'aws-sdk', '2.0.0.pre'

And then in your application:

require 'aws-sdk-v1'
require 'aws-sdk'

# v1 uses the AWS module, v2 uses the Aws module
s3_v1 = AWS::S3::Client.new
s3_v2 = Aws::S3::Client.new

Links of Interest

Happy coding, and as always, feedback is welcomed!

Version 2 Resource Interfaces

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

In version 1 of the AWS SDK for Ruby provides a 1-to-1 client class for each AWS service. For many services it also provides a resource-oriented interface. These resource objects use the client to provide a more natural object-oriented experience when working with AWS APIs.

We are busy working resource interfaces for the v2 Ruby SDK.

Resource Interfaces

The following examples use version 1 of the aws-sdk gem. This first example uses the 1-to-1 client to terminate running instances:

ec2 = AWS::EC2::Client.new
resp = ec2.describe_instances
resp[:reservations].each do |reservation|
  reservation[:instances].each do |instance|
    if instance[:state][:name] == 'running'
      ec2.terminate_instances(instance_ids:[instance[:instance_id]])
    end
  end
end

This example uses the resource abstraction to start instances in the stopped state:

ec2 = AWS::EC2.new
ec2.instances.each do |instance|
  instance.start if instance.status == :stopped
end

Resources for Version 2

We have a lot of lessons learned from our v1 resource interfaces. We are busy working on the v2 abstraction. Here are some of the major changes from v1 to v2.

Memoization Interfaces Removed

The version 1 resource abstraction was very chatty by default. It did not memoize any resource attributes and a user could unknowingly trigger a large number of API requests. As a workaround, users could use memoization blocks around sections of their code.

In version 2, all resources objects will hold onto their data/state until you explicitly call a method to reload the resource. We are working hard to make it very obvious when calling a method on a resource object will generate an API request over the network.

Less Hand-Written Code and More API Coverage

The version 1 SDK has hand-coded resource and collection classes. In version 2, our goal is to extend the service API descriptions that power our clients with resource definitions. These definitions will be consumed to generate our resource classes.

Using resource definitions helps eliminate a significant amount of hand written code, ensures interfaces are consistent, and makes it easier for users to contribute resource abstractions.

We also plan to provide extension points to resources to allow for custom logic and more powerful helpers.

Resource Waiters

It is a common pattern to operate on a resource and then wait for the change to take effect. Waiting typically requires making an API request, asserting some value has changed and optionally waiting and trying again. Waiting for a resource to enter a certain state can be tricky. You need to deal with terminal cases, failures, transient errors, etc.

Our goal is to provide waiter definitions and attach them to our resource interfaces. For example:

# create a new table in Amazon DynamoDB
table = dynamodb.table('my-table')
table.update(provisioned_throughput: { 
  read_capcity_units: 1000
})

# wait for the table to be ready
table.wait_for(:status, 'ACTIVE')

In a follow up blog post, I will be introducing the resources branch of the SDK that is available today on GitHub. Please take a look and feedback is always welcome!

Response Paging

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

We’ve been busy working on version 2 of the AWS SDK for Ruby. One of the features we added recently was response paging.

Paging in the Version 1 Ruby SDK

In version 1 of the Ruby SDK provides collection classes for many AWS resources. These collections are enumerable objects that yield resource objects.

iam = AWS::IAM.new
user_collection = iam.users
user_collection.each |user|
  puts user.name
end

A collection in version 1 sends a request to enumerate resources. If the response indicates that more data is available, then the collection will continue sending requests to enumerate all resources.

If you want to enumerate a resource that is not modeled in the version 1 SDK, then you need to drop down to the client abstraction and deal with paging on your own.

iam = AWS::IAM.new
options = { max_items: 2 }
begin
  response = iam.client.list_users(options)
  response[:users].each do |user|
    puts user[:user_name]
  end
  options[:marker] = response[:marker]
end while options[:marker]

Response Paging in Version 2 Ruby SDK

One of our main goals of the version 2 Ruby SDK is to improve the experience of users accessing AWS from the client abstractions. Version 2 does not provide resource abstractions yet, but it does provide full response paging from the client interface.

Here is the example above re-written using the version 2 Ruby SDK:

iam = Aws::IAM.new
iam.list_users.each do |response|
  puts response.users.map(&:user_name)
end

Each AWS operation now returns a pageable response object. This object is enumerable. Calling #each on a Aws::PagableResponse object yields the response and any follow up responses.

There are a few other helper methods that make it easy to control response paging:

resp.last_page? #=> false
resp.next_page? #=> true

# get the next page, raises an error if this is the last page
resp = resp.next_page

# gets each response in a loop
resp = resp.next_page until resp.last_page?

Resource Enumeration in the Version 2 Ruby SDK

You will notice the response paging examples don’t address enumerating individual resource objects. We are busy implementing a resource abstraction for the version 2 Ruby SDK. The v2 resources will be enumerable in a method similar to v1. It will however be built on top of client response paging.

Watch the GitHub respository and this blog for more information on resource abstractions in the version 2 Ruby SDK.

AWS SDK Core v2.0.0.rc12 Updates

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

We recently published v2.0.0.rc12 of the aws-sdk-core gem (https://github.com/aws/aws-sdk-core-ruby). This release merges the long-running normalized branch onto master.

Upgrading Notes

Please note, when updating to rc12, you may need to make some minor code changes. These are summarized below:

  • Service modules now have a Client class, these should be used to construct API clients:

    # deprecated, will be removed for 2.0.0 final
    s3 = Aws::S3.new
    
    # preferred
    s3 = Aws::S3::Client.new
    

    Looking forward to the resources update, this will ensure we have a suitable namespace for the new resource classes. Look for more information in a follow up blog post.

  • The Amazon SimpleDB client class has been renamed from Aws::SDB to Aws::SimpleDB. This also affects the short name used in configuration:

    # old configuration key
    Aws.config[:sdb] = { ... }
    
    # new key
    Aws.config[:simpledb] = { ... }
    
  • The :raw_json configuration option has been renamed to :simple_json. This is used for services that use the JSON protocol.

Less Visible Changes

If you have written plugins for the aws-sdk-core gem, there are a few other changes to the internals you need to be aware of.

  • Seahorse::Model has received significant updates, especially the API model format. This new format is much more flexible than the denormalized format used previously. Additionally, the AWS API models are now consumed as-is without translation. See the API reference for more information.

  • Seahorse::Client::Http::Request#endpoint is now URI::HTTPS or URI::HTTP object. The custom Endpoint class has been removed in favor of these objects provided by the Ruby standard library.

  • Seahorse::Client::HandlerList#add no longer accepts instance objects and requires a handler class that can be constructed.

Downloading Objects from Amazon S3 using the AWS SDK for Ruby

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

The AWS SDK for Ruby provides a few methods for getting objects out of Amazon S3. This blog post focuses on using the v2 Ruby SDK (the aws-sdk-core gem) to download objects from Amazon S3.

Downloading Objects into Memory

For small objects, it can be useful to get an object and have it available in your Ruby processes. If you do not specify a :target for the download, the entire object is loaded into memory into a StringIO object.

s3 = Aws::S3::Client.new
resp = s3.get_object(bucket:'bucket-name', key:'object-key')

resp.body
#=> #<StringIO ...> 

resp.body.read
#=> '...'

Call #read or #string on the StringIO to get the body as a String object.

Downloading to a File or IO Object

When downloading large objects from Amazon S3, you typically want to stream the object directly to a file on disk. This avoids loading the entire object into memory. You can specify the :target for any AWS operation as an IO object.

File.open('filename', 'wb') do |file|
  reap = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: file)
end

The #get_object method still returns a response object, but the #body member of the response will be the file object given as the :target instead of a StringIO object.

You can specify the target as String or Pathname, and the Ruby SDK will create the file for you.

resp = s3.get_object({ bucket:'bucket-name', key:'object-key' }, target: '/path/to/file')

Using Blocks

You can also use a block for downloading objects. When you pass a block to #get_object, chunks of data are yielded as they are read off the socket.

File.open('filename', 'wb') do |file|
  s3.get_object(bucket: 'bucket-name', key:'object-key') do |chunk|
    file.write(chunk)
  end
end

Please note, when using blocks to downloading objects, the Ruby SDK will NOT retry failed requests after the first chunk of data has been yielded. Doing so could cause file corruption on the client end by starting over mid-stream. For this reason, I recommend using one of the preceding methods for specifying the target file path or IO object.

Retries

The Ruby SDK retries failed requests up to 3 times by default. You can override the default using :retry_limit. Setting this value to 0 disables all retries.

If the Ruby SDK encounters a network error after the download has started, it attempts to retry request. It first checks to see if the IO target responds to #truncate. If it does not, the SDK disables retries.

If you prefer to disable this default behavior, you can either use the block mode or set :retry_limit to 0 for your S3 client.

Range GETs

For very large objects, consider using the :range option and download the object in parts. Currently there are no helper methods for this in the Ruby SDK, but if you are interested in submitting something, we accept pull requests!

Happy downloading.

Ruby 2.1 on AWS OpsWorks

by Trevor Rowe | on | in Ruby | Permalink | Comments |  Share

We are pleased to announce that AWS OpsWorks now supports Ruby 2.1. Simply select the Ruby version you want, your Rails stack – Passenger or Unicorn, the RubyGems version, and whether you want to use Bundler. Then deploy your app from your chosen repository – Git, Subversion, or bundles on S3. You can get started with a few clicks in the AWS Management console.