AWS Developer Tools Blog
Introducing support for Amazon S3 Select in the AWS SDK for Ruby
We’re excited to announce support for the Amazon Simple Storage Service (Amazon S3) #select_object_content
API with event streams in the AWS SDK for Ruby. Amazon S3 Select enables you to retrieve only a subset of data from an object by using simple SQL expressions.
Amazon S3 streams the responses as a series of events, instead of returning the full response all at once. This provides performance benefits by enabling you to process response messages as they come in. To support this behavior, the AWS SDK for Ruby now supports processing events asynchronously, instead of needing to wait for the full response to be loaded before you can process it.
SDK version requirement
To use event streams and the Amazon S3 #select_object_content
API, you need to use version 3 of the AWS SDK for Ruby. You also need to have the aws-sdk-s3
gem version 1.13.0
or later available.
For more information about the AWS SDK for Ruby and its guides, check out our GitHub README.
Amazon S3 select usage pattern
Let’s try an SQL query against a CSV file in Amazon S3. Given that I have a CSV document named target_file.csv
stored in an S3 bucket named my-bucket
in the AWS Region us-west-2
, with contents describing user and age information:
Assuming this is a huge file and you want to select data of rows of users whose age is over 12 years old, you would have a SQL expression like the following:
By following the SDK for Ruby API documentation for #select_object_content
request syntax, we could come up with input parameters for the operation, like this:
Now we have everything ready to make the API call. To process events once they arrive, you can use a block statement attached to the S3 Select call, or provide a handler that has callbacks registered for events.
Using a Ruby block statement
The following example shows how to use a block to process all events.
Pass in :event_stream_handler
You can pass in a handler that can be an EventStream object or a Ruby Proc object that is registered with callbacks for the :event_stream_handler
option.
Using an EventStream object
Let’s try using the :event_stream_handler
option with an Aws::S3::EventStreams::SelectObjectEventStream
object.
Using a Proc object
Using a Proc object is also supported with the same pattern.
Using a hybrid pattern
You can also try a hybrid of the previous two usage patterns, as follows.
Notice that in the previous example, the on_error_event
callback is available for capturing all error events that happened after a stream connection is established. If an error happened when the request started, but before the stream response started, you can still rescue it from Aws::S3::Errors::ServiceError
.
When using a hybrid pattern, also note that callbacks passed in with a block statement attached to the API call would be registered to the :event_stream_handler
that was passed in. Thus, if the handler object is reused, it will contain all registered callbacks.
Wait for a full response
Of course, you can still wait for a full response to complete to fetch all events that are available from an Enumerator. (Notice that with the above streaming usage pattern, full response is also available.)
Response stubbing support
In addition to using the S3 Select API, the AWS SDK for Ruby also provides stubbed event stream responses for Rspec tests that you might want to write.
Let’s say you want to mock an event stream response with events (including errors). You just need to provide an Enumerator of mocking events, as follows.
And you use :stub_responses
, similarly to other APIs.
Final thoughts
With Amazon S3 Select, you can use SQL statements to filter the contents of Amazon S3 objects and retrieve just the subset of data that you need. You can process selected record events asynchronously with the AWS SDK for Ruby, with multiple usage patterns. You can also use stubbed responses for the S3 Select API and write tests for your code.
Feedback
Please share your questions, comments, and issues with us on GitHub. You can also catch us in Gitter Channel.