In this tutorial, we will examine a challenge that many developers face: how to build a scalable application that performs resource-intensive tasks.
The specific application we will build is a stock market quotes application. Figure 1 shows the the client portion.
Figure 1 - Stock Market Quotes Application client
The functionality of the sample is relatively straightforwardthe user specifies a list of stock symbols, and the application retrieves quotes for these symbols from a financial web service.
However, what makes this sample special is how this functionality is implemented. The symbol for each quote is gathered by one or more worker processes. The client and workers communicate using Amazon Simple Queue Service (SQS). By using SQS, we can process more stock quotes by simply adding more workers. SQS also gives us reliability; if a worker crashes, the messages it needs to read will simply just stay in the queue. When the worker recovers, it can pick up where it left off, or another worker can take its place.
We will examine this sample from two perspectives. First, we will see how using SQS yields a great deal of reliability and scalability with a minimum amount of effort. Second, we will see how SQS has been encapsulated into a Microsoft Visual Studio component, which integrates SQS with the rapid application development nature of Visual Studio and facilitates easy re-use.
This sample can be downloaded from http://s3.amazonaws.com/sqs-public-images/StockQuotes.zip
Stock Symbols as a Unit of Work
The best way to visualize how the Stock Quote Example works is by thinking of each stock symbol as a unit of work. We can use our client to add stock symbols, as shown in Figure 2.
Figure 2 - Adding a stock symbol
For the sake of creating a substantial set of work, we also populate our application with a set of default stock symbols. These symbols are shown in Figure 3.
Figure 3 - StockQuoteClient.cscode listing
For each symbol we have, the "work" involved is obviously to get a quote for that symbol. The service we are using in our sample is from Yahoo Finance; the code to invoke this service is shown in Figure 4.
Figure 4 - StockQuote.cs
We could have implemented our sample in a number of different ways. First, if we took a very na�ve approach, we would sequentially gather quotes for each symbol we have. The problem with this approach is that even the highest-performing web service takes a few seconds to return a value. If we were to sequentially process our stock symbols, our client would "hang" for an indeterminate amount of time as it found a quote for each symbol. This is not an ideal behavior for our sample.
Another more sophisticated approach would be to use threads. We could either spawn a new thread for each stock symbol, or use .NET's built-in thread pool. In either case, we would be able to process our symbols with a degree of parallelism. This solves the main problem with our first approach.
However, we are constrained by the resources of a single machine. Even though we are using multiple threads, they all have to share the CPU and network access. Even in the case of multi-threaded and multi-core CPUs, we are still working with a finite amount of resources; this will limit our scalability at some point.
This is where the advantages of our implementation become apparent. Since we are treating each stock symbol as a unit of work, this work can be divided among a number of workers. This gives us the concurrency that the multi-threaded approach gave us.
And, by using Amazon SQS to facilitate the communication between our client and the workers, we can host our workers on separate machines.
This gives us scalability. When there is more work to do (i.e. a long list of stock symbols), we can simply spin up more workers to process this work. Once that work is done, we can shut down these workers.
This is an example of the elastic scalability that SQS enables in an application.
Scalability and performance (via concurrency) often come with a price, which is usually reflected in the complexity of implementation. However, as we will see, our sample has been implemented to take advantage of the componentized, rapid application development nature of Microsoft Visual Studio 2008.
Drag and Drop SQS
The Stock Quote Example is actually two samples in one. It is an example of how to use Amazon SQS in an application, but it is also an example of how SQS can be integrated into a development environment like Visual Studio 2008.
The Requestor.cs and Worker.cs files shown in Figure 5 are where the bulk of the interaction with SQS is encapsulated.
Figure 5 - Visual Studio Components
In addition, these two files implement the attributes of the Component class, which allows them to be added to the Visual Studio Toolbox; this is illustrated in Figure 6.
Figure 6 - SQS Components
Because they are part of the Toolbox, these components can be simply dragged and dropped onto any visual designer that Visual Studio offers.
Figure 7 illustrates the Requestor component as part of our stock quote client (StockQuoteClient.cs).
Figure 7 - Requester component on a Windows Form
The attributes you would normally set when using SQS are available as properties in Visual Studio, as shown in Figure 8.
Figure 8 - Requestor properties
(Our AWS ID and secret are blacked out in the illustration for privacy reasons. Please use your own Amazon Account information.)
The Queue property specifies the name of the SQS queue that we will use with our workers to communicate. We will examine the other properties later on in this document.
Since we are using workers (which we will examine shortly) to do the actual work of retrieving a stock quote, we just have to send messages to our queue. From the perspective of the client, the only data that we have to supply is the stock symbol; this code is illustrated in Figure 9.
Figure 9 - Sending messages to our queue
Once our messages are sent to the queue, they will be securely stored by Amazon for up to 4 days. This is one of the ways that Amazon SQS enables an application to be more reliable. Since the message is persisted for a long period of time from a computing perspective, there is ample time for a worker to pick up the message and process the work. If there are no workers available, the work simply stays ready to be processed in SQS until one is.
At this point, a logical question to ask would be how the client determines that a stock quote is ready to be displayed. Again, because our requestor was created as a Visual Studio component, we have simply defined an event handler. Figure 10 illustrates the delegate being used from the Properties window.
Figure 10 - Event handler
The code for our event handler is listed in Figure 11; the stock quote is delivered to us as a comma-delimited string that we parse with our StockQuote class.
This event handler could be triggered by any worker that is processing our messages; from our perspective as a client, we do not care where this processed work came from.
Figure 11 - Handling responses
Our Requestor component makes it nearly trivial to add SQS to our applicationwe simply drag and drop it onto our form, enter our Amazon credentials, and start adding work and processing responses.
Figure 12 - Worker
The next step in our examination of the Stock Quote Example will be to see how work is pulled from the queue and processed. There are two places we can see this work. Our workers will do this to get a stock symbol to process. But, our Requestor class does the same thing to process responses
from workers and trigger the response delegate.
Since we have not examined the worker side of our architecture yet, that is where we will go to see how messages are pulled from the queue.
Polling for Work
The Worker aspect of the example (shown in Figure 12) is implemented as a Microsoft Windows Forms application for the purpose of clarity. As a Windows Forms application, we can easily view how it is pulling work from the queue and processing it. In a real application, workers are usually implemented as services.
Figure 13 - StockQuoteWorker.cs
As you might recall from Figure 6, the Worker is a component on our Toolbox as well. However, for our worker example, we have chosen to use it programmatically, rather than put it on our design surface. This decision was made purely to show that you could do both with the control.
The code to initialize our Worker component is listed in Figure 13. As you can see, we have a corresponding delegate in the Worker componentour worker application will be notified when a message has been retrieved from the queue. Also note that StockQueue is the name of the queue we are working withthis is the same value as in our client. Our id and secret are blacked out.
Looking at Figure 8 and Figure 12 you will notice that both our client and worker have a property called PollingMode. As its name would suggest, this property determines how we poll an SQS queue and check for messages.
There are many different ways to poll a resource like a queue.
The simplest way is to sit in a simple loopif there are no messages in our queue, we sleep for a fixed period of time. There is nothing wrong with this approach, but it is not optimal when using SQS.
Figure 14 - Amazon SQS Pricing
Figure 14 illustrates Amazon's pricing scheme for SQS as of February 2008. As you can see, you are charged 1 cent per 10, 000 requests. This includes a request to ReceiveMessage, which is the message you use to check for the presence of a message.
Since Amazon charges for each request (granted, it is a fraction of a penny), our straightforward approach to polling on a regular schedule is not optimal. If there are no messages, then it makes sense to reduce the frequency of our polling to reduce our cost.
Figure 15 - Polling modes
Our Worker application helps us illustrate this. If you change the polling modes (see Figure 15), you can see the cost per hour to run the application change (see Figure 16).
Figure 16 - Cost per hour
Again, the net cost we are discussing is a matter of pennies, but it can become a concern for applications that run for a very long time, or applications that use a large number of queues.
A more efficient approach to polling is what we call adaptive polling. Simply put, we will poll more frequently when there are lots of messages, and we will poll less when there are fewer messages.
This algorithm is implemented in the SQSManager.cs file, and best examined starting from the AdaptiveDequeue method listed in Figure 17.
Figure 17 - Adaptive polling
We will examine this method line by line. First, on line 406, we make a call to PollSQS. This method simply tries to receive one message from our SQS queue. Our algorithm really begins when we determine if we were able to read a message or not.
If we received a message, lines 416 and 417 from Figure 18 show that we are setting a counter and a delay value. We will use the delay value later on in the method to pause the current thread of execution. If we were able to read a message, we will not pause very long (i.e. 1 millisecond) because we want to see if there are more messages. With adaptive polling, we want to read messages out the queue as fast as possible. If you look through the source code in SQSManager.cs in more detail, you will see we actually spin up more threads to read messages faster.
Figure 18 - We received a message
Once we have read all of the messages, we will scale down our threads and pace. Figure 19 lists the code that is executed when we are not able to read a message.
Figure 19 - Scaling down
On line 423, we can see that when we do not get a message, we will set our delay back to a reasonable amount. The _receiveDelay variable is set to 500 milliseconds elsewhere in the code. We also increment our emptyReceiveCount variable; this variable helps us track how many times we read from the queue and were not able to get a message back.
Line 426 is where this variable comes into play. There are two clauses in this conditional statement. The first one is true; we have come up empty 20 times or more. The second clause requires a little bit more of an explanation. We use the _messageCount variable in a separate thread. This separate thread sends a request to SQS to determine the number of messages in our queue. This code is listed in Figure 20.
Figure 20 - Determining the number of messages
At first glance, this logic may seem redundant; if we have not been able to read a message 20 times in a row, then shouldn't we be able to assume that the queue is indeed empty? With Amazon SQS, we cannot make this assumption.
In order to provide scalability and reliability, Amazon distributes your queue and messages across its vast datacenters.
This mitigates the risk of a single datacenter outage affecting you. However, the consequence of this type of distribution is that it will take time for your data to replicate. Amazon uses a model of data availability that is called eventual consistency. An in-depth examination of this principle is beyond the scope of this document, but more information can be found on the December 19, 2007, entry in Werner Vogels' weblog on building scalable and robust distributed systems.
What we have to be cognizant about in our code is that when we do not get a message back from a read operation, it might be due to eventual consistency; there are messages in our queue, there just might not be a message in the particular distributed instance of our queue that we tried to read from.
This is why we also check the number of messages. In Figure 20, we are getting that value in line 590. We are encapsulating the details of getting the number of messages from SQS behind our ApproximateNumberOfMessages method. The name of this method is very apt because this is another case where we have to be cognizant of eventual consistency.
Figure 21 - Worker Efficiency Ratio
Sending a request to SQS to get the number of messages is a more accurate way to determine if a queue is empty compared to just trying to read a message from it. However, the number of messages that we get back is an approximation. Amazon will not try to find every distributed instance of our queue and sum up the number of messages. Instead, it will sample a few distributed instances of our queue and return an approximate count back to us.
Looking back at Figure 19, line 426, when both conditions are truewe have had 20 empty reads in a row and the approximate number of messages in our queue is 0we will scale down our polling. We will shut down our message counting thread and fall back to minimal polling.
If and when we read a message from the queue, we will scale things back up again. One metric that can be very valuable for tuning your polling algorithm is the Worker Efficiency Ratio shown in Figure 21. This value is determined by the number of messages received divided by the number of receive operations.
No matter how we choose to poll our queue, one obvious benefit to using SQS in our application is that we are able to de-couple our client and worker. As long as our client and any number of workers have access to the Internet, they can communicate via SQS. It does not matter what hardware environment we choose to execute our application on. However, there are some factors we will examine when making that decision.
Choosing an Execution Environment
One of the most important facts to remember about using the Amazon Web Services in general is that communication within the Amazon environment is extremely fast. Any network traffic between EC2 instances, S3 buckets, SimpleDB, and SQS will be carried out within the Amazon infrastructure and will be highly optimized.
This fact is a key factor in deciding whether to execute an environment within your own infrastructure or within Amazon EC2. In 2008, Amazon added support for Microsoft Windows in EC2. With this support, it is now entirely possible to develop and deploy a distributed application on Windows within EC2.
Windows images are treated just like any other AMI, so you can use the same tools as you would normally.
Once you have started a Windows AMI, you can use Remote Desktop to access it, just like any other Windows machine you are using.
Figure 22 shows Windows AMIs listed in the open-source Elastic Fox tool.
Figure 22 - Windows AMIs listed in ElasticFox
In most cases, the most network latency is experienced when your application is using your network to connect to Amazon Web Services. We can measure this latency with the MeasureNetLatency property on our Requestor component as shown in Figure 23. (Again, our ID and secret are blacked out for privacy.)
Figure 23 - Capturing network latency
When we set this property to true, we are able to gather the amount of time it takes for our client's request to get to Amazon SQS. Figure 24 and Figure 25 show the network latency inside and outside of Amazon.
In Figure 24, our client is executing in an environment outside of Amazon.
Figure 24 - Network latency making call to Amazon
In Figure 25, both the client and workers are executing within the EC2 environment. The values are really just representative since we did not use a completely sanitized external network, but the differences are expected to be significant.
Figure 25 - Network latency inside of Amazon
Besides improved performance, another important side effect of using Amazon EC2 as your execution environment is pricing. Figure 26 shows a snippet of the Amazon pricing documentation. A key passage has been highlighted; data that is transferred between SQS and EC2 is free of charge. This is also true for EC2 to S3 transfers.
Figure 26 - Snippet of Amazon Pricing
It is probably not surprising, or unintentional, that we are gently guiding you toward using Amazon EC2 as your execution environment. Improved network performance and cost savings are two examples of the benefits of doing so. However, the most compelling reason to consider Amazon EC2 as your execution environment is scalability.
Stretching Out Scalability with Amazon EC2
Amazon EC2 offers what is practically an unlimited amount of computing power; you can use as many machine instances as you need and pay only for what you use.
The Windows EC2 instance that we used for this example consisted of the following:
- Windows Server 2003 (pre-installed)
- Visual Studio 2008 C# Express
We installed Visual Studio 2008 C# Express on our instance because we did the actual development within EC2 as well. As stated earlier, in our example, we created our worker as a Windows Forms so that we could more easily examine statistics like network latency. In a more realistic application, we would have wrapped our worker logic as a Windows Service; this is a simple matter of using the Project Wizard to create the framework for us (as illustrated in Figure 27).
Figure 27 - New Project Dialog
Once we have our worker configured as a Windows Service, we can package our instance as an AMI itself. By doing this, we can instantiate as many of these custom machine images as we need to process incoming requests. When the amount of requests subsides, we can start to terminate instances.
This is the same type of adaptive processing that we saw earlier, however, we are now doing it at the machine level.
Hopefully what you have seen is how using Amazon Simple Queue Service (SQS) enables you to add reliability and scalability to your application. And just as importantly, hopefully you have seen that the Requestor and Worker components allow you to add Amazon SQS with a great deal of productivity. Once SQS is part of your infrastructure, you are able to decouple your client and workers.
This decoupling opens up the possibility of using Amazon EC2 as an execution environment, where you can realize significant performance advantages.