AWS News Blog

How Collections Work in the AWS SDK for Ruby

Today we have a guest blog post from Matty Noble, Software Development Engineer, SDKs and Tools Team. 

– rodica


We’ve seen a few questions lately about how to work with collections of resources in the SDK for Ruby, so I’d like to take a moment to explain some of the common patterns and how to use them. There are many different kinds of collections in the SDK. To keep thing simple, I’ll focus on Amazon EC2, but most of what you’ll see here applies to other service interfaces as well.

Before we do anything else, let’s start up an IRB session and configure a service interface to talk to EC2:

$ irb -r rubygems -r aws-sdk  > ec2 = AWS::EC2.new(:access_key_id => "KEY", :secret_access_key => "SECRET")  

There are quite a few collections available to us in EC2, but one of the first things we need to do in any EC2 application is to find a machine image (AMI) that we can use to start instances. We can manage the images available to us using the images collection:

> ec2.images  => <AWS::EC2::ImageCollection>  

When you call this method, you’ll notice that it returns very quickly; the SDK for Ruby lazy-loads all of its collections, so just getting the collection doesn’t do any work. This is good, because often you don’t want to fetch the entire collection. For example, if you know the ID of the AMI you want, you can reference it directly like this:

> image = ec2.images["ami-310bcb58"]   => <AWS::EC2::Image id:ami-310bcb58>  

Again, this returns very quickly. We’ve told the SDK that we want ami-310bcb58, but we haven’t said anything about what we want to do with it. Let’s get the description:

> image.description   => "Amazon Linux AMI i386 EBS"  

This takes a little longer, and if you have logging enabled you’ll see a message like this:

[AWS EC2 200 0.411906] describe_images(:image_ids=>["ami-310bcb58"])  

Now that we’ve said we want the description of this AMI, the SDK will ask EC2 for just the information we need. The SDK doesn’t cache this information, so if we do the same thing again, the SDK will make another request. This might not seem very useful at first — but by not caching, the SDK allows you to do things like polling for state changes very easily. For example, if we want to wait until an instance is no longer pending, we can do this:

> sleep 1 until ec2.instances["i-123"].status != :pending  

The [] method is useful for getting information about one resource, but what if we want information about multiple resources? Again, let’s look at EC2 images as an example. Let’s start by counting the images available to us:

> ec2.images.to_a.size  [AWS EC2 200 29.406704] describe_images()   => 7677  

The to_a method gives us an array containing all of the images. Now, let’s try to get some information about these images. All collections include Enumerable, so we can use standard methods like map or inject. Let’s try to get all the image descriptions using map:

> ec2.images.map(&:description)  

This takes a very long time. Why? As we saw earlier, the SDK doesn’t cache anything by default, so it has to make one request to get the list of all images, and then one request for each returned image (in sequence) to get the description. That’s a lot of round trips — and it’s mostly wasted effort, because EC2 provides all the information we need in the response to the first call (the one that lists all the images). The SDK doesn’t know what to do with that data, so the information is lost and has to be re-fetched image by image. We can get the descriptions much more efficiently like this:

> AWS.memoize { ec2.images.map(&:description) }  

AWS.memoize tells the SDK to hold on to all the information it gets from the service in the scope of the block. So when it gets the list of images along with their descriptions (and other information) it puts all that data into a thread-local cache. When we call Image#description on each item in the array, the SDK knows that the data might already be cached (because of the memoize block) so it checks the cache before fetching any information from the service.

We’ve just scratched the surface of what you can do with collections in the AWS SDK for Ruby. In addition to the basic patterns above, many of our APIs allow for more sophisticated filtering and pagination options. For more information about these APIs, you can take a look at the extensive API reference documentation for the SDK. Also don’t hesitate to ask questions or leave feedback in our Ruby development forum.

A note about AWS.memoize

AWS.memoize works with both EC2, IAM and ELB; we’d like to extend it to other services, and we’d also like to hear what you think about it. Is the behavior easy to understand? Does it work well in practice? Where would this feature be most beneficial to your application?