How to Automatically Prevent Email Throttling when Reaching Concurrency Limit

Introduction

Many users of Amazon Simple Email Service (Amazon SES) send large email campaigns that target tens of thousands of recipients. Regulating the flow of Amazon SES requests can prevent throttling due to exceeding the AWS service limit on the account.

Amazon SES service quotas include a soft limit on the number of emails sent per second (also known as the “sending rate”). This quota is intended to protect users from accidentally sending unintended volumes of email, or from spending more money than intended. Most Amazon SES customers have this quota increased, but very large campaigns may still exceed that limit. As a result, Amazon SES will throttle email requests. When this happens, your messages will fail to reach their destination.

This blog provides users of Amazon SES a mechanism for regulating the flow of messages that are sent to Amazon SES. Cloud Architects, Engineers, and DevOps Engineers designing new, or improving an existing Amazon SES solution would benefit from reading this post.

Overview

A common solution for regulating the flow of API requests to Amazon SES is achieved using Amazon Simple Queue Service (Amazon SQS). Amazon SQS can send, store, and receive messages at virtually any volume and can serve as part of a solution to buffer and throttle the rate of API calls. It achieves this without the need for other services to be available to process the messages. In this solution, Amazon SQS prevents messages from being lost while waiting for them to be sent as emails.

Fig 1 — High level architecture diagram

But this common solution introduces a new challenge. The mechanism used by the Amazon SQS event source mapping for AWS Lambda invokes a function as soon as messages are visible. Our challenge is to regulate the flow of messages, rather than invoke Amazon SES as messages arrive to the queue.

Fig 2 — Leaky bucket

Developers typically limit the flow of messages in a distributed system by implementing the “leaky bucket” algorithm. This algorithm is an analogy to a bucket which has a hole in the bottom from which water leaks out at a constant rate. Water can be added to the bucket intermittently. If too much water is added at once or at too high a rate, the bucket overflows.

In this solution, we prevent this overflow by using throttling. Throttling can be handled in two ways: either before messages reach Amazon SQS, or after messages are removed from the queue (“dequeued”). Both of these methods pose challenges in handling the throttled messages and reprocessing them. These challenges introduce complexity and lead to the excessive use of resources that may cause a snowball effect and make the throttling worse.

Developers often use the following techniques to help improve the successful processing of feeds and submissions:

Submit requests at times other than on the hour or on the half hour. For example, submit requests at 11 minutes after the hour or at 41 minutes after the hour. This can have the effect of limiting competition for system resources with other periodic services.
Take advantage of times during the day when traffic is likely to be low, such as early evening or early morning hours.

However, these techniques assume that you have control over the rate of requests, which is usually not the case.

Amazon SQS, acting as a highly scalable buffer, allows you to disregard the incoming message rate and store messages at virtually any volume. Therefore, there is no need to throttle messages before adding them to the queue. As long as you eventually process messages faster than you receive new ones, you will be fine with the inflow that will get processed later on.

Regulating flow of messages from Amazon SQS

The proposed solution in this post regulates the dequeue of messages from one or more SQS queues. This approach can help prevent you from exceeding the per-second quota of Amazon SES, thereby preventing Amazon SES from throttling your API calls.

Available configuration controls

When it comes to regulating outflow from Amazon SQS you have a few options. MaxNumberOfMessages controls the maximum number of messages you can dequeue in a single read request. WaitTimeSeconds defines whether Amazon SQS uses short polling (0 seconds wait) or long polling (more than 0 seconds) while waiting to read messages from a queue. Though these capabilities are helpful in many use cases, they don’t provide full control over outflow rates.

Amazon SQS Event source mapping for Lambda is a built-in mechanism that uses a poller within the Lambda service. The poller polls for visible messages in the queue. Once messages are read, they immediately invoke the configured Lambda function. In order to prevent downstream throttling, this solution implements a custom poller to regulate the rate of messages polled instead of the Amazon SQS Event source mechanism.

Custom poller Lambda

Let’s look at the process of implementing a custom poller Lambda function. Your function should actively regulate the outflow rate without throttling or losing any messages.

First, you have to consider how to invoke the poller Lambda function once every second. Using Amazon EventBridge rules you can schedule Lambda invocations at a rate of once per minute. You also have to consider how to complete processing of Amazon SES invocations as soon as possible. And finally, you have to consider how to send requests to Amazon SES at a rate as close as possible to your per-second quota, without exceeding it.

You can use long polling to meet all of these requirements. Using long polling (by setting the WaitTimeSeconds value to a number greater than zero) means the request queries all of the Amazon SQS servers, or waits until the maximum number of messages you can handle per second (the MaxNumberOfMessages value) are read. By setting the MaxNumberOfMessages equal to your Amazon SES request per-second quota, you prevent your requests from exceeding that limit.

By splitting the looping logic from the poll logic (by using two Lambda functions) the code loops every second (60 times per minute) and asynchronously runs the polling logic.

Fig 3 — Custom poller diagram

You can use the following Python code to create the scheduler loop function:

import os
from time import sleep, time_ns

import boto3

SENDER_FUNCTION_NAME = os.getenv("SENDER_FUNCTION_NAME")
lambda_client = boto3.client("lambda")

def lambda_handler(event, context):
    print(event)

    for _ in range(60):
        prev_ns = time_ns()

        response = lambda_client.invoke_async( 
            FunctionName=SENDER_FUNCTION_NAME, InvokeArgs="{}" 
        ) 
        print(response)

        delta_ns = time_ns() - prev_ns

        if delta_ns < 1_000_000_000: 
            secs = (1_000_000_000.0 - delta_ns) / 1_000_000_000 
            sleep(secs)

This Python code creates a poller function:

import json 
import os

import boto3

UNREGULATED_QUEUE_URL = os.getenv("UNREGULATED_QUEUE_URL") 
MAX_NUMBER_OF_MESSAGES = 3 
WAIT_TIME_SECONDS = 1 
CHARSET = "UTF-8"

ses_client = boto3.client("ses") 
sqs_client = boto3.client("sqs")

def lambda_handler(event, context): 
    response = sqs_client.receive_message( 
        QueueUrl=UNREGULATED_QUEUE_URL, 
        MaxNumberOfMessages=MAX_NUMBER_OF_MESSAGES, 
        WaitTimeSeconds=WAIT_TIME_SECONDS, 
    )

    try: 
        messages = response["Messages"] 
    except KeyError: 
        print("No messages in queue") 
        return

    for message in messages: 
        message_body = json.loads(message["Body"]) 
        to_address = message_body["to_address"] 
        from_address = message_body["from_address"] 
        subject = message_body["subject"] 
        body = message_body["body"]

        print(f"Sending email to {to_address}")

        ses_client.send_email( 
            Destination={ 
                "ToAddresses": [ 
                    to_address, 
                ], 
            }, 
            Message={ 
                "Body": { 
                    "Text": { 
                        "Charset": CHARSET, 
                        "Data": body, 
                    } 
                }, 
                "Subject": { 
                    "Charset": CHARSET, 
                    "Data": subject, 
                }, 
            }, 
            Source=from_address, 
        )

        sqs_client.delete_message( 
            QueueUrl=UNREGULATED_QUEUE_URL, ReceiptHandle=message["ReceiptHandle"] 
        )

Regulating flow of prioritized messages from Amazon SQS

In the use case above, you may be serving a very large marketing campaign (“campaign1”) that takes hours to process. At the same time, you may want to process another, much smaller campaign (“campaign2″), which won’t be able to run until campaign1 is complete.

Obvious solution is to prioritize the campaigns by processing both campaigns in parallel. For example, allocate 90% of the Amazon SES per-second capacity limit to process the larger campaign1, while allowing the smaller campaign2 to take 10% of the available capacity under the limit. Amazon SQS does not provide message priority functionalities out-of-the-box. Instead, create two separate queues and poll each queue according to your desired frequency.

Fig 4 — Prioritize campaigns by queue diagram

This solution works fine if you have consistent flow of incoming messages to both queues. Unfortunately, once you finish processing campaign2 you will keep processing campaign1, using only 90% of the limit capacity per second.

Handling unbalanced flow

For handling unbalanced flow of messages merge both of your poller Lambdas. Implement one Lambda that polls both queues for MaxNumberOfMessages (that equals 100% of the limits of both). In this implementation send from your poller Lambda 90% of campaign1 messages and 10% of campaign2 messages. When campaign2 no longer has messages to process, keep processing 100% of the capacity from campaign1’s queue.

Do not delete unsent messages from the queues. These messages will become visible after their queue’s visibility timeout is reached.

To further improve on the previous implementations, introduce a third FIFO Queue to aggregate all messages from both queues and regulate dequeuing from that third FIFO queue. This will allow you to use all available capacity under your SES limit, while interweaving messages from both campaigns at a 1:10 ratio.

Fig 5 — Adding FIFO merge queue diagram

Processing 100% of the available capacity limit of the large campaign1 and 10% of the capacity limit of the small campaign2 allows you to make sure campign2 messages will not wait until campaign1 messages were all processed. Once campain2 messages are all processed, campign1 messages will continue to be processed using 100% of the capacity limit.

You can find here instructions for Configuring Amazon SQS queues.

Conclusion

In this blog post, we have shown you how to regulate the dequeue of Amazon SQS queue messages. This will prevent you from exceeding your Amazon SES per second limit. This will also remove the need to deal with throttled requests. We explained how to combine Amazon SQS, AWS Lambda, Amazon EventBridge to create a custom serverless regulating queue poller. Finally, we described how to regulate the flow of Amazon SES requests when using multiple priority queues. These technics can reduce implementation time for reprocessing throttled requests, optimize utilization of SES request limit, and reduce costs.

About the Authors

This blog post was written by Guy Loewy and Mark Richman, AWS Senior Solutions Architects for SMB.