Why isn't my Lambda function with an Amazon SQS event source scaling optimally?
Last updated: 2020-01-30
My AWS Lambda function with an Amazon Simple Queue Service (Amazon SQS) trigger isn't scaling as expected. How can I be sure that my function scales to optimal concurrency?
When you configure an SQS queue as an event source and messages are available for processing, Lambda begins with a maximum concurrency of five. Optimally, Lambda functions with an Amazon SQS trigger can scale up 60 additional instances per minute to a maximum of 1,000 concurrent invocations. For your function to scale optimally, the following must be true:
- The function isn't producing any errors.
- There's sufficient unreserved concurrency in the AWS Region, or the reserved concurrency for the function is at least 1,000.
- There are messages in the SQS queue.
If there are any errors when Lambda attempts to invoke your function, the service prevents your function from scaling to prevent errors at scale. As soon as the errors stop, Lambda continues to scale up your function. It scales up 60 additional concurrent invocations per minute as long as your account isn't at or near the service quota for scaling or burst concurrency in the Region. Your function can scale up to a maximum of 1,000 concurrent invocations.
If you haven't configured reserved concurrency on your function, it shares the default unreserved concurrency quota of 1,000 with other functions in the same account and Region. If there's at least 1,000 unreserved concurrency in the Region, your function continues to scale until it reaches the maximum concurrency. Otherwise, invocations are throttled when all your unreserved concurrency is in use.
If you configured reserved concurrency on your function, be sure that there's at least 1,000 reserved concurrency. If it's lower, your function will be throttled when it reaches the reserved value. For more information, see Managing Concurrency for a Lambda Function.
Check queue depth
Lambda only scales invocations if there are messages in the SQS queue. If you don't see any throttles in your function's metrics and there are no errors, check the SQS queue metric ApproximateNumberOfMessagesNotVisible. This metric shows how many messages still need to be processed.
If the metric is low (or at 0), your function can't scale. If the metric is high and growing, and there are no errors, check the batch size configuration for your trigger. There's a small delay between receiving a message from the queue and invoking your function. If your batch size is set low (at 1, for example), the small delay can build up across all invocations and cause a backup of messages in your queue. Try increasing the batch size until duration increases faster than batch size, up to the maximum batch size of 10.
Note: If you set a batch size of 10, but your SQS queue has fewer than 1,000 messages, you're less likely to receive a full batch of 10 messages in your invocations. For more information, see ReceiveMessage in the Amazon SQS API Reference.