How the Amazon SQS FIFO API Works

We have just introduced FIFO queues for Amazon SQS. These queues offer strictly ordered message delivery and exactly-once message processing. The FIFO API builds on the SQS API and adds new capabilities. This post explains the additions, how they work, and when to use them.

Customers have asked us for these features. Although many apps perform well with SQS’s traditional super-scalable at-least-once message processing, some applications do need ordering or exactly-once processing. For example, you might have a queue used to carry commands in an interactive “shell” session, or a queue that delivers a stream of price updates. In both cases, the messages must be processed in order and exactly once. FIFO queues make it much easier to support these apps, while preserving the SQS ultra-simple API.

You can use FIFO queues much as you use SQS queues today: sending, receiving, and deleting messages, and retrying whenever you get send or receive failures. What’s new with the FIFO API is that they support messages being delivered in order and processed only once. Now, on to the details.

Note: If your network connections don’t drop for minutes at a time, and your messages have unique identifiers, you should be able to get strict ordering and exactly-once processing with little extra effort; the default settings for all the FIFO-specific arguments will be appropriate.

Making Queues
There are two additions to the SQS CreateQueue API; both are Boolean-valued queue attributes. FifoQueue turns FIFO on; this discussion applies to queues that were created with FifoQueue set to true. The other is ContentBasedDeduplication, which we’ll discuss later in the context of sending messages.

Ordering vs. Exactly-Once Processing
Ordering and exactly-once processing behaviors aren’t the same thing. You always get deduplication, but you can control ordering behavior by using a new string-valued parameter named MessageGroupId, which applies to the SendMessage and SendMessageBatch APIs.

Basically, the FIFO behavior applies to messages that have the same MessageGroupId. This means you have three options:

Give all the messages in the queue the same MessageGroupId (an empty string is fine) so that they are all delivered in order.
Mix up a few different MessageGroupIds. This makes sense, for example, if you are tracking data from several different customers. The idea is if you use the customer ID as the MessageGroupId, the records for each customer will be delivered in order; there’s no particular ordering between records from different customers.
Give every message a different MessageGroupId, for example, by making a new UUID for each one. Then, there isn’t any particular ordering, but you get the deduplication (we provide details in the next section).

What Does “FIFO” Mean, Anyhow?
FIFO is only really meaningful when you have one single-threaded sender and one single-threaded receiver. If there are a bunch of threads or processes writing into a queue, you can’t really even tell if it’s FIFO: the messages show up in the queue depending on when the senders get scheduled.

On the other hand, with MessageGroupId, you can have a bunch of independent senders throwing messages at a queue, each with their own MessageGroupId, and each sender’s messages show up at the other end in order.

At the receiving end, when you call ReceiveMessage, the messages you get may have several different MessageGroupIds (assuming there is more than one MessageGroupId in the queue); the messages for each MessageGroupId will be delivered in order. A receiver can’t control which MessageGroupIds it’s going to get messages for.

What Does “Duplicate” Mean?
FIFO queues are designed to avoid introducing duplicate messages. Historically, standard SQS queues offered “at least once” service, with the potential of occasional duplicate message delivery.

The good news is that with the FIFO API, you get to decide what “duplicate” means. There are two tools you can use: the Boolean ContentBasedDeduplication queue attribute, and an optional Boolean SendMessage/SendMessageBatch parameter, MessageDeduplicationId.

Two messages are duplicates if they have the same MessageDeduplicationId. If ContentBasedDeduplication is set, SQS calculates the MessageDeduplicationId for you as an SHA256 of the message content (but not the message attributes).

One implication is that if you haven’t set ContentBasedDeduplication on the queue, you must provide a MessageDeduplicationId or SQS throws an error. On the other hand, if ContentBasedDeduplication is set, you can still provide the MessageDeduplicationId and SQS will use yours instead of the SHA256.

Now, in a lot of cases, application messages include some sort of unique identifier, often a UUID. In that case, ContentBasedDeduplication reliably detects dupes. So, why would you ever provide a MessageDeduplicationId?

Maybe you have messages with duplicate bodies. For example, I wrote an app that was pumping an HTTP server log into a FIFO queue. There are lots of dumb bots and crawlers on the Internet that will fire a bunch of requests for the same resource at your server more often than once per second. They show up as duplicate lines in the log, and if you use ContentBasedDeduplication, SQS will think they’re dupes. In this scenario, you might want to generate a MessageDeduplicationId for each line. (Actually, I didn’t; for what I was working on, I wanted unique entries, so I needed SQS to suppress the dupes.)
Maybe you have messages that aren’t the same, but you want SQS to treat them as duplicates. One example I’ve heard about is a mobile phone app that often gets network failures when it’s out in a coverage-free area, so when it sends messages it includes a field saying how many times it had to retry. But the app doesn’t want more than one message getting through, so it keeps using the same MessageDeduplicationId until a message is acknowledged.
Maybe you want to send messages that have identical content but different metadata attributes. ContentBasedDeduplication only works on the content, so you’ll need to add a MessageDeduplicationId.

What Happens to Duplicates?
Suppose you send two messages with the same MessageDeduplicationId. There are a bunch of ways this could happen, the most obvious one being that you had a network breakage and SQS got the message, but because the acknowledgment didn’t get back to your app, the SendMessage call seemed to fail.

In any case, when this happens, SQS lets your call succeed, but then just tosses the duplicate on the floor, so that only one should ever get into the queue.

This means that if you’re talking to SQS and for some reason your call fails, you don’t have to worry whether it’s your fault, the fault of SQS, or the network’s fault. Just retry with the same MessageDeduplicationId as many times as you want until the call succeeds, and only one copy should be transmitted.

How Long Does SQS Remember Duplicates?
It turns out that SQS implements transmit deduplication by remembering which MessageDeduplicationId values it’s seen. To be precise, it remembers this for at least five minutes, which means that if you send a pair of duplicate messages more than five minutes apart, both might get through.

In most apps, this should be a complete nonissue, but it can happen. Consider a mobile app that’s running on a device in a car. You send a message and SQS gets it, but before you get the acknowledgment, you go into a coverage-free area and lose signal for ten minutes. When the app gets signal again, if you just retry the original call, it’s possible the receiver might get that message twice.

There are a variety of strategies you could adopt to work around this. The most obvious way is to send a request when you’ve lost network for a while, saying “what’s the last message received?”

A Note on Rate Limits
Anyone who’s used SQS seriously has come to appreciate its immense ability to soak up traffic. Generally, when you try to put messages into SQS and you get throttled, the right thing to do is just keep trying and pretty soon it’ll work. In fact, in many cases the SDK just does this for you so you don’t even notice.

FIFO queues are different. The ordering imposes a real throughput limit – currently 300 requests per second per queue. The queue just can’t go faster than that, and retrying won’t help. Fortunately, our conversations with customers have told us that FIFO applications are generally lower-throughput—10 messages per second or lower.

“At Least Once” vs. “Exactly Once”
SQS FIFO does not claim to do exactly-once delivery. To start with, there is no such thing; go looking around the Internet and you’ll find lots of eminent Computer Scientists explaining why exactly-once is impossible in distributed systems.

And regardless, you don’t really want exactly-once delivery, because your reader could blow up due to a cosmic ray or network problem or whatever after it’s received the message, but before it’s had a chance to act on it. In that case, you absolutely want SQS to deliver the message again.

What you want, and what FIFO queues are designed to offer, is exactly-once processing. To understand how this works, let’s walk through the details of how you go about receiving SQS FIFO messages (which in most respects is exactly like receiving traditional non-FIFO SQS messages).

You call ReceiveMessage and get back a batch; each message is accompanied by a ReceiveHandle.
You can provide a VisibilityTimeout argument; if you don’t, there’s a default value, normally 30 seconds.
For the VisibilityTimeout period, the messages you’ve received are hidden from other callers and, because this is a FIFO queue, other callers are partially blocked from reading those messages to preserve FIFO ordering. “Partially blocked” has to do with MessageGroupId. If all the messages have the same one, then the whole queue is blocked; if there are several different MessageGroupIds, only the MessageGroupIds in the batch you receive are blocked.
You can also provide an optional ReceiveMessageDeduplicationId argument to ReceiveMessage.
Normally, assuming everything goes well, you process your messages and then delete them. To delete each message, you have to use the ReceiveHandle that came with it. This tells SQS the message has been processed and will never be delivered again. You can also use the ReceiveHandle to extend the VisibilityTimeout for messages you’re working on.
Suppose you get a network breakage such that SQS got your ReceiveMessage request and sent you the messages, but you didn’t get them. You can go ahead and retry. If you provide the same ReceiveMessageDeduplicationId, SQS sends you the messages (and ReceiveHandles) right away, and resets the VisibilityTimeout timer. If you don’t provide the same ReceiveMessageDeduplicationId, SQS has to wait for the VisibilityTimeout to expire before releasing the messages to you. So the only real effect of the ReceiveMessageDeduplicationId is that retries of failed ReceiveMessage calls run faster. If your environment and network are solid, you might not want to bother using them.
One more detail: If you use the ReceiveHandles to delete messages or change their visibility, you can no longer retry with that same ReceiveMessageDeduplicationId.

Can this ever go wrong? Yes, but it’s a pretty narrow corner case. Suppose you receive a batch of messages, and for some reason or other you get stuck, and the VisibilityTimeout expires before you get around to deleting them. In that case, they’d be released and if there are other readers, they might end reading and processing duplicates.

By the way, in this scenario where the VisibilityTimeout has expired, your DeleteMessage call will fail, so at least you can be sure of detecting the situation.

Some things you might do to prevent this from happening:

Have only one reader.
Use a nice, long VisibilityTimeout.
Keep track of the time, and before you process each message, double check to be sure its VisibilityTimeout hasn’t expired.

Summing up
We’ve gone through a lot of details here. However, in the most common case, where you have reasonably reliable connections and your messages already have unique IDs, you can ignore most of the details and just take the defaults. Set ContentBasedDeduplication on your queue, then just go ahead and use SendMessage as you always have, only with MessageGroupId arguments.

On the receiver side, if you’re worried about retry performance, supply a ReceiveMessageDeduplicationId on each call. Other than that, go ahead and use the same SQS calls to process messages that you already have. The results should be:

All the messages with the same MessageGroupId will be processed in the order they were sent.
If a message is read and deleted, no duplicate should ever be delivered to a queue reader.

Happy fully ordered, duplicate-free message processing!

AWS Developer Tools Blog

How the Amazon SQS FIFO API Works

Resources

Follow