Amazon Elastic Compute Cloud (EC2) is a web service that provides resizable compute capacity in the cloud, allowing your applications to respond quickly to rapid fluctuations in demand. At all times, you can be responsive to your users while ensuring that your computing power is optimally used. With Amazon EC2, you don't have to maintain excess servers in anticipation of future demand, or continue to run servers at sub-optimal utilization rates when the demand decreases. Instead, you use auto-scaling, which means you dial up or dial down the number of EC2 instances you need based on your current load. Amazon provides many features for you to implement auto-scaling, including web service APIs to start and stop EC2 instances very quickly. However, it is up to you to determine how many EC2 instances you need at any given time. This paper focuses on how to implement auto-scaling with Amazon Simple Queue Service (Amazon SQS).
Amazon SQS is a highly reliable, scalable message queuing service that enables asynchronous message-based communication between distributed components of an application. Amazon SQS is a complementary web service that enables you to build highly scalable EC2 applications easily. To learn more about the benefits of using Amazon SQS with Amazon EC2 read "Getting Started with Amazon SQS and Amazon EC2".
Auto-Scaling EC2 Instances
With auto-scaling, you're trying to simultaneously achieve two seemingly opposing goals:
- To have plenty of EC2 instances running so that your application is responsive and able to meet defined service level agreements
- To limit the number of EC2 instances running so that computing resources aren't being wasted
The key is to establish a relationship between the load on the application and the optimum number of EC2 instances required to handle the load. However, there isn't a single way to define the load on an application or the right proxy for the load. There are several proxies for load, including:
- Number of simultaneous user connections
- Number of incoming requests
- CPU, memory, or bandwidth utilization
- Average response time
Some of the proxies are easier to measure and track than others.
An alternative is Amazon SQS, which provides a simple way to represent load and implement auto-scaling for EC2 instances. This is discussed in the next section.
Example of EC2 and SQS Working Together
To make the discussion concrete, consider a fairly simple but common scenario. Assume you want to offer an online photo processing service for consumers. The service lets users specify the operations they want performed on their photos. Operations could include red eye reduction, cropping, thumbnail creation, customization, re-coloring, teeth whitening, etc. Users upload the photos to your web site and specify the tasks to be performed on the photos. Users can submit as few as one or as many as hundreds of photos in a single upload session. After submitting the photos, users can come back and check the status of their photos. After the photos have been processed, users can download their processed photos from the web site.
Let's assume that different operations take different processing times, ranging from a few seconds to several minutes. Therefore, the time to complete the user's request depends on the number of photos, the size of the photos, and the processing operations to be performed.
The figure below provides an overview of the architecture of a simple system that meets the requirements listed above. It includes Amazon SQS working in conjunction with Amazon EC2 and Amazon S3.
Every user request results in a message being queued into the Amazon SQS request queue. At the same time, the application stores the photos in Amazon S3. The message in the queue contains (among other things) the photo processing operations to be performed and a pointer to the location of the photos in Amazon S3. A photo processing server (one of many), running in an EC2 instance, reads a message from the queue, processes the request, and on completion, posts a status message to the response queue. (For simplicity it is assumed that each server instance is running in a dedicated EC2 instance. It is possible to have more than one server instance running in a single EC2 instance.)
Amazon SQS enables multiple photo processing servers to run simultaneously against the same queues. When a server picks up a message from the request queue, it locks the message for a defined amount of time, during which it processes and deletes the message. During this lockout period the message is not visible to other server instances. This feature of Amazon SQS ensures that each message is processed at least once. If for any reason a server is unable to finish processing and deleting the message (for example the server might crash unexpectedly), the message becomes available to other servers after the lockout period.
Using SQS to Determine the Number of Server Instances
Queues provide a convenient mechanism to determine the load on an application. Queues have two metrics that are a good proxy for load:
- Length of the queue (number of messages in the queue): Because each message in the queue represents a request from a user, measuring the length of the queue is a fair approximation of the load on the application. By trial and error, you can determine the optimal length of the queue (the length at which you have just the right number of servers running to cover the demand). At any time, if the current length of the queue exceeds the optimal length, then you should start an additional server instance. Likewise, if the current length falls below the optimal length, then it's time to shut down a server instance.
You can determine the approximate number of messages in your queue by calling GetQueueAttributes API, with the argument ApproximateNumberOfMessages, as described in the API reference: http://docs.amazonwebservices.com/AWSSimpleQueueService/2008-01-01/SQSDeveloperGuide/
- Time in the queue (TIQ). TIQ refers to the time elapsed between en-queuing a message (putting a message in the queue) and de-queuing the message (taking the message out of the queue). TIQ can be a good indicator of the responsiveness of the application. The longer the TIQ, the less responsive the application is, because the turnaround time is higher (from the user's perspective). As with the length of the queue, once you determine the optimal TIQ, you can start or shut down server instances when the actual TIQ deviates from the optimal TIQ.
With Amazon SQS the best way to determine TIQ is to store a timestamp in the body of your message, corresponding to when the message was sent to SQS. Your monitoring application can then read the message, and can compare this timestamp to the current time. The difference between them corresponds to the TIQ for this message.
Important: When you implement a solution based on one of the two load measurements discussed above, don't design your system to overreact to momentary spikes in the load. Instead of defining a precise number for the optimal length of the queue or TIQ, we recommend you define a range and start or shut down instances only if the actual value is outside the range for a sustained period of time.
Note: Both of the above approaches assume that all requests take about the same amount of time to process. If the requests require varying amounts of time to process, we recommend you create multiple queues, each based on a general range of expected processing time. Your system can then send all messages to a single queue that forwards each message to another queue based on the expected processing and deletion time for that message.
Amazon has partners who provide comprehensive auto-scaling solutions for EC2 using Amazon SQS. Two of the partner products are listed below strictly for your information. Amazon does not endorse or recommend these products. The descriptions of the products were provided to us by the vendors.
Lifeguard automates a system of loosely coupled services running on Amazon Web Services. The basic architecture uses Amazon S3 for data storage, EC2 for compute capacity and SQS for managing units of work. In any application where there is work that needs to be processed in real-time or in an asynchronous background process, lifeguard can help. What lifeguard does is manage the compute part of the architecture. It starts and stops servers to meet changing demand. It provides many adjustments to tune the behavior so that your compute resources scale at the rate your application demands.
To get started, it helps to understand the parts of the application. Lifeguard itself (the pool management) can run as part of your application or separately. It is configured to manage separate pools of servers for each service your application needs (i.e. video transcoding, OCR or PDF conversion). To get work started, the data ingestion process occurs, which amounts to copying a file or files to Amazon S3, then sending a message or messages via Amazon SQS. There is an ingestor script that comes with lifeguard to help do this manually, however most applications will integrate directly with APIs within lifeguard to call ingestion utility code to move data and/or initiate work via messages in the work queues. Status of the pools and of the work performed are all sent back to lifeguard and can be tied into via lifeguard APIS by your application.
To learn more, visit http://code.google.com/p/lifeguard. Once there, download the latest release, and visit the "GettingItRunning" wiki page. You'll be able to run a sample service. To implement your own service, you can extend AbstractBaseService. To help your application submit data and work requests, simply extend IngestorBase. Monitoring tools and war deployment are coming in the next release.
RightGrid from RightScale lets you control and manage any background or batch processing worker tasks in a scalable, fault-tolerant, and audited environment. RightGrid was designed to simplify the task of processing large numbers of jobs enqueued in Amazon SQS with the data to be processed residing in S3. The RightGrid framework takes care of all the fetching and pushing of data files, and all you need to do is plug-in your processing code that takes local files as input and produces local output files. See the RightGrid architecture document for details.
RightScale offers direct assistance in using the RightGrid framework as part of their RightGrid product offering, described more fully at http://www.rightscale.com/m/products.html#rightgrid.
Other Useful Links
Son of Monster Muck Mashup - Mass Video Conversion Using AWS
Describes how to build a highly scalable video conversion service using Amazon EC2, Amazon SQS and Amazon S3.
Automated Server Pool Management in Java
Shows how to manage a pool of servers using Amazon EC2, Amazon SQS and Amazon S3.