Processing Pipeline with Amazon S3, SQS, EC2, Ruby, Rails and ActiveMessaging

items>Processing Pipeline with Amazon S3, SQS, EC2, Ruby, Rails and ActiveMessaging
Community Contributed Software

  • Amazon Web Services provides links to these packages as a convenience for our customers, but software not authored by an "@AWS" account has not been reviewed or screened by AWS.
  • Please review this software to ensure it meets your needs before using it.

The purpose of this sample application is to help Ruby and Rails developers integrate Amazon SQS using the ActiveMessaging plugin for Rails into applications that require a large amount of 'heavy-lifting' be handled asynchronously and in a scalable manner.In this sample application Amazon SQS enables the heavy-lifting (image watermarking) to be handled not by the Rails application ('Servers') but rather by a decoupled pool of Amazon EC2 instances ('Workers') dedicated to work asynchronously.


Submitted By: Philip@AWS
AWS Products Used: Amazon SQS
Language(s): Ruby
License: Apache License 2.0
Created On: January 19, 2008 12:17 AM GMT
Last Updated: September 21, 2008 8:42 PM GMT

About This Sample


A Rails application running on a 'Server' EC2 instance uploads to Amazon S3 an image submitted through the Rails application's form by the user. The Rails application puts a job message with details of the uploaded object into the 'todo' queue from the Rails controller using the ActiveMessaging Rails plug-in. One or more EC2 'Worker' instances poll the 'todo' queue for jobs, read the message and download, watermark the image then upload the new image to Amazon S3. The details of this new image are added to the original job message and the new message put into the 'done' SQS queue then the original message in the 'todo' queue is deleted.

The Rails application retrieves messages at regular intervals from the 'done' queue with the ActiveMessaging 'poller' daemon. An ActiveRecord model encapsulates a message and the ActiveMessaging plug-in saves all the SQS messages sent and received. The Rails application controller reads messages sent and received from the database rather than directly from the 'done' queue and sends them to the view that displays jobs. This prevents a situation where a user or many users update their view to see their submitted and completed jobs and each refresh or request makes new calls to Amazon SQS. N number of users each with N refreshes would increase Amazon SQS usage exponentially.

Media Processing Pipeline with Amazon Web Services Diagram

Figure 1: Media Processing Pipeline with Amazon Web Services

It's worth noting that the format of Amazon SQS messages created in this application is shared with that of the 'boto' library from Mitch Garnaat. The Ruby YAML library is very helpful in working with these RFC-822 compliant messages.
See Mitch Garnaat's Monster Muck Mashup - Mass Video Conversion Using AWS

Muck, Heavy Lifting

In this application the 'muck', or 'heavy-lifting' is demonstrated by watermarking images. See the 'watermarker.rb' file in the root of the .ZIP file containing the sample application. Of course your application might do any other kind of work. If there were intermediate steps between non-watermarked and watermarked more Amazon SQS queues might be used. In this sample application there is no intermediate state and so the job goes directly from the 'todo' queue to the 'done' queue when the work is performed.


The amount of watermarking that can be done asynchronously without affecting the Rails application performance at all is increased simply by starting many more 'Worker' EC2 instances, Amazon SQS ensures that if there is a job one of the 'Workers' will pick it up and process it.

This code sample uses an SQLite3 database with the Rails application - not scalable nor persistent. Using Amazon SimpleDB the Rails application could be scaled by starting many instances of the Rails application and all would use Amazon SimpleDB for persisting the messages.


This sample application code is accompanied by a public AMI. It employs a 'pull'-like mechanism for deployment instead of 'pushing' (what is done with capistrano).
See PJ Cabrera's Using Parameterized Launches to Customize Your AMIs

This application further demonstrates using public key encryption to protect the AWS keys that are passed to the Amazon EC2 instances at launch time. The corresponding private key is bundled into the AMI. On launching an instance of the AMI, the private key is deleted before rc.local adds to the SSH authorized_hosts file the keypair used to launch the instance .

The ec2-launch-instances reads the user data it associates with the instance from the configuration file using the -f switch instead of from the stdin. Using the configuration file works better from a shell because of the length and contents of the encrypted, base 64 encoded, AWS keys cipher text.

Once the AWS keys are decrypted, the keys are put into the appropriate configuration files (broker.yml, amazon_s3.yml) and either the Rails 'script/server' and ActiveMessaging 'poller' are launched (if the 'server' keyword is present in the instance user data), or only the 'watermarker.rb' script is run (if the 'worker' keyword is present in the instance user data).

See the 'launch.rb' file in the root of the accompanying code sample .


  1. You are signed up and active for Amazon S3, SQS, EC2

  2. You can run the EC2 Command Line Tools

  3. (try running 'ec2-describe-instances')
  4. You have followed the Amazon EC2 getting started guide

  5. (you will have a key called gsg-keypair, use this or your preferred keypair where you see <mykeypair> in the instructions below)
  6. You have an empty bucket in Amazon S3

  7. (create a new bucket if you need to)
If you prefer or wish to later download the code annd run it locally see the section below titled "Running the Sample Code Locally".

Otherwise this code sample is meant to be run inside the cloud with it's accompanying public AMI. You do not need to download the code. Continue with the section immediately below titled "Running the Sample".

Running the Sample

    Create a configuration file for 'Servers'.

  1. Create a file called server.cfg. Edit and save the file with the below two lines in it (replace <mybucket> with the name of an empty bucket you own):

  2. server
  3. Download the application's public key that will be used to encrypt your AWS access and secret access keys (or copy the URL and download it from your browser)

  4. curl -O

  5. Encrypt a copy of your AWS keys with the aws-pipeline application's public key, base 64 encode it and append it to your server.cfg

  6. (Substitute your AWS keys where you see <awsaccesskey> and <awssecretaccesskey>)
    (The aws-pipeline EC2 AMI contains a corresponding private key to decrypt your AWS keys)
    (You are trusting the owner of the AMI)
    (Only you can SSH into the instances of this AMI that you will launch)

    echo "<awsaccesskey><awssecretaccesskey>" | \
    openssl rsautl -encrypt -inkey aws-pipeline_public.pem -pubin | \
    openssl base64 >> server.cfg

    Create a configuration file for 'Workers'.

  7. Create a copy of server.cfg called worker.cfg

  8. cp server.cfg worker.cfg
  9. Edit worker.cfg and change the word 'server' on the first line to 'worker'. Save the file.

  10. Launch and connect to Amazon EC2 instances.

  11. Create a security group for 'Servers' and allow traffic to port 80

    ec2-add-group aws-pipeline -d "AWS Pipeline Instances"
    ec2-authorize aws-pipeline -P tcp -p 80
  12. Launch two 'Workers'

    ec2-run-instances ami-a128cdc8 -g aws-pipeline -k <mykeypair> -f worker.cfg -n 2
  13. Launch a single 'Server'

    ec2-run-instances ami-a128cdc8 -g aws-pipeline -k <mykeypair> -f server.cfg

    Example Output:

    INSTANCE i-5edf2f37 ami-a128cdc8 pending mykeypair 0 m1.small 2008-01-11T19:28:45+0000

  14. Wait 30 seconds and get the details of the folly booted-up instance using the InstancID from the below output (i.e.: 'i-5edf2f37').

    ec2-describe-instances i-5edf2f37

    Example Output:

    RESERVATION r-d40ce7bd 319268305561 defaolt
    INSTANCE i-5edf2f37 ami-a128cdc8 domU-12-31-38-00-39-F2.compute-1.internal running mykeypair 0 m1.small 2008-01-11T19:28:45+0000

  15. Copy the public DNS name (ends with '') for the instance (e.g. ''). Open it in your preferred browser.

    Upload a JPEG to watermark

    AWS Processing Pipeline Screen Shot - Input Job

    Refresh for Completed Job

    AWS Processing Pipeline Screen Shot - Show Jobs

Running the Sample Code Locally

You will need the below Ruby gems. (note that the RMagick gem requires ImageMagick be installed in your development environment.)
  • rails (2.0.2)
  • right_aws (1.7.1)
  • aws-s3 (edge)
  • daemons (edge)
  • RMagick (edge)
  1. Download the code sample to a directory of your choice.

  2. Enter your AWS Access Key and Secret Access Key into the development section of both config/broker.yml and config/amazon_s3.yml.

  3. Run the following in the directory to which you downloaded the .ZIP file.

  4. cd aws-pipeline
    script/server &
    script/poller run &
    ./watermarker.rb &
  5. Open this link in your browser: http://localhost:3000/

Related Articles

Introduction to AWS for Ruby Developers

Monster Muck Mashup - Mass Video Conversion Using AWS

Using Parameterized Launches to Customize Your AMIs

Introduction to ActiveMessaging for Rails


+ April 30 2008
- updated for SQS 2.0.
- upgraded to Rails 2.0.2.
- watermarker.rb uses right_aws instead of sqs gem for SQS 2.0 support.
- updated to latest versions of attachment_fu and activemessaging plugins.


Please use this forum thread for submitting reviews, bugs, or discussion of this sample app:

[ANN] AWS Processing Pipeline with Ruby, Rails and ActiveMessaging

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.