Threading with the AWS SDK for Ruby
When using threads in an application, it’s important to keep thread-safety in mind. This statement is not specific to the Ruby world; it’s a reality in any language that supports threading. What is specific to Ruby is the fact that many libraries in our language are loaded at run-time, and often, loading code at run-time is not a thread-safe operation.
Autoload and Thread-Safety
Many libraries and frameworks (including Ruby on Rails) use a feature of Ruby known as autoload, which allows components of a library to be lazily loaded only when the constant is resolved in the code path of an executing program. The problem with this feature is that, historically, the implementation has not been thread-safe. In other words, if two threads tried to resolve an autoloaded constant at the same time, weird things would happen. This problem was finally tackled in Ruby 1.9.1 but then regressed in 1.9.2 and re-resolved in 1.9.3 (but only in a later patchlevel), causing a bit of confusion around whether
autoload is actually safe to use in a threaded Ruby program.
Thread-Safe in 2.0
For all intents and purposes, autoloading of modules should be considered thread-safe in Ruby 2.0.0p0, as the patch was officially merged into the 2.0 branch prior to release. Any thread-safety issues in Ruby 2.0 should be considered regressions, according to that ticket.
Enter Eager Loading
Of course, guaranteeing support for Ruby 2.0 is not entirely sufficient for most programs still running on 1.9.x, and in some cases, 1.8.x, so you may need to use a more backward-compatible strategy. In Ruby on Rails, this was solved with an
eager_autoload method that forcibly loads all modules marked to be lazily loaded. If you are running threaded code, it is recommended that you call this prior to launching threads. Note that in Rails 4.0, the framework will eager load all modules by default, which should help you avoid having to think about these threading issues.
Eager Autoloading in AWS SDK for Ruby
So is this an issue for the AWS SDK for Ruby? In short, if you are using a version prior to Ruby 2.0, the answer is "most likely". The SDK is large enough that lazily loading extra modules is important to keep library load time as fast as possible. The downside of this approach is that it can cause issues in multi-threaded programs.
To solve the problem in the SDK, we use a similar mechanism to Ruby on Rails and created an
AWS.eager_autoload! method that requires all modules in the library up front. To use this method, simply call it before you launch any threads:
require 'aws-sdk' AWS.eager_autoload! # Now you can start threading Thread.new do ... end
Focused Eager Loading
Sometimes, loading all of the SDK is unnecessary and slow. Fortunately, as of version 1.9.0 of the Ruby SDK, the
AWS.eager_autoload! method now optionally accepts the name of a module to load instead of requiring you to eager load the entire SDK. This means that if you are only using a specific service, or a set of services, like Amazon S3 and Amazon DynamoDB, you can choose to eager load only these modules. This can help to improve load time of your application, especially if you do not need many of the other modules packaged in the SDK. To load a focused set of modules, simply call the eager autoload method with the names of the modules you want to load along with
AWS.eager_autoload! AWS::Core # Make sure to load Core first. AWS.eager_autoload! AWS::S3 # Load the S3 class AWS.eager_autoload! AWS::DynamoDB # Load the DynamoDB class # Now you can start threading Thread.new do ... end
Wrapping Up This Thread
The AWS SDK for Ruby has an
AWS.eager_autoload! method that allows you to forcibly load all components in the library up front. If you are writing multi-threaded code in Ruby, you will most likely want to call this method before launching any threads that make use of the SDK in order to avoid any thread-safety issues with
autoload in older versions of Ruby. Fortunately, it is very easy to use by adding a single method call to the top of your application. It is also easy to target specific modules to eager load by passing the module name to the method, if load-time performance is important to your library or application.