Scaling with Amazon SQS
Amazon SQS has been by used many AWS customers as a building block for building highly reliable and scalable distributed systems. SQS is continuing to see tremendous growth and is used as a building block for building asynchronous distributed systems. In this post, we wanted to highlight a handful of best practices to achieve high scale throughput using SQS.
- Separate Throughput from Latency – Like many other AWS services, SQS is accessed through HTTP request-response, and a typical SQS request-response takes a bit less than 20ms from EC2. This means that from a single thread you can, on average, issue 50+ API requests / sec (a bit fewer for batch API requests but those do more work). The throughput scales horizontally, so the more threads and hosts you add, the higher the throughput. Using this scaling model, some of our customers have queues that process thousands of messages every second.
- Use Batch APIs Wherever Relevant – in addition to the internal improvements that help deliver stable performance and good scalability, we also introduced a batch API model back in October. It is now possible to send, receive, and delete up to 10 messages at a time. This makes it possible to achieve a given throughput with fewer threads and hosts and, because SQS charges are per request, can potentially greatly reduce customer costs. In fact, we have seen a steady growth in the adoption of those API since their introduction..
- Tradeoff Message Durability and Latency – SQS does not return success to a SendMessage API call until the message is durably stored in SQS. This makes the programming model very simple with no doubt about the safety of messages unlike the situation with an async messaging model. However, if you don’t need a durable messaging system, you can build an asynchronous client side batching on top of SQS libraries that delays enqueue of messages to SQS and transmits a set of messages in a batch. Please be aware that with a client side batching approach, you could potentially lose messages when your client process or client host dies for any reason.
A new section of the SQS documentation presents simple samples demonstrating SQS performance with examples for both the base API (single operations), and the batch API.
The Amazon Web Services Messaging team is growing and we are looking to add new members who are passionate about building large scale distributed systems. If you are a solid software development engineer, quality assurance engineer, or engineering manager/leader, we would like to hear from you. We are moving fast, so send your resume to firstname.lastname@example.org and will interview the most promising candidates immediately.
— Atulya Beheray