AWS Compute Blog
Scheduling AWS Lambda Provisioned Concurrency for recurring peak usage
This post is contributed by Jerome Van Der Linden, AWS Solutions Architect
Concurrency of an AWS Lambda function is the number of requests it can handle at any given time. This metric is the average number of requests per second multiplied by the average duration in seconds. For example, if a Lambda function takes an average 500 ms to run with 100 requests per second, the concurrency is 50 (100 * 0.5 seconds).
When invoking a Lambda function, an execution environment is provisioned in a process commonly called a cold start. This initialization phase can increase the total execution time of the function. Consequently, with the same concurrency, cold starts can reduce the overall throughput. For example, if the function takes 600 ms instead of 500 ms due to cold start latency, with a concurrency of 50, it handles 83 requests per second.
As described in this blog post, Provisioned Concurrency helps to reduce the impact of cold starts. It keeps Lambda functions initialized and ready to respond to requests with double-digit millisecond latency. Functions can respond to requests with predictable latency. This is ideal for interactive services, such as web and mobile backends, latency-sensitive microservices, or synchronous APIs.
However, you may not need Provisioned Concurrency all the time. With a reasonable amount of traffic consistently throughout the day, the Lambda execution environments may be warmed already. Provisioned Concurrency incurs additional costs, so it is cost-efficient to use it only when necessary. For example, early in the morning when activity starts, or to handle recurring peak usage.
Example application
As an example, we use a serverless ecommerce platform with multiple Lambda functions. The entry point is a GraphQL API (using AWS AppSync) that lists products and accepts orders. The backend manages delivery and payments. This GitHub project provides an example of such an ecommerce platform and is based on the following architecture:
![Post sample architecture](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/12/architecture-1.png)
Post sample architecture
This company runs “deal of the day” promotion every day at noon. The “deal of the day” reduces the price of a specific product for 60 minutes, starting at 12pm. The CreateOrder function handles around 20 requests per seconds during the day. At noon, a notification is sent to registered users who then connect to the website. Traffic increases immediately and can exceed 400 requests per seconds for the CreateOrder function. This recurring pattern is shown in Amazon CloudWatch:
![Reoccurring invocations](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/12/invocations.png)
Reoccurring invocations
The peak shows an immediate increase at noon, slowly decreasing until 1pm:
![Peak invocations](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/12/decreasing-invocations.png)
Peak invocations
Examining the response times of the function in AWS X-Ray, the first graph shows normal traffic:
![Normal performance distribution](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/12/normal-distribution.png)
Normal performance distribution
While the second shows the peak:
![Peak performance distribution](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/12/peak-distribution.png)
Peak performance distribution
The average latency (p50) is higher during the peak with 535 ms versus 475 ms at normal load. The p90 and p95 values show that most of the invocations are under 800 ms in the first graph but around 900 ms in the second one. This difference is due to the cold start, when the Lambda service prepares new execution environments to absorb the load during the peak.
Using Provisioned Concurrency at noon can avoid this additional latency and provide a better experience to end users. Ideally, it should also be stopped at 1pm to avoid incurring unnecessary costs.
Application Auto Scaling
Application Auto Scaling allows you to configure automatic scaling for different resources, including Provisioned Concurrency for Lambda. You can scale resources based on a specific CloudWatch metric or at a specific date and time. There is no extra cost for Application Auto Scaling, you only pay for the resources that you use. In this example, I schedule Provisioned Concurrency for the CreateOrder Lambda function.
Scheduling Provisioned Concurrency for a Lambda function
- Create an alias for the Lambda function to identify the version of the function you want to scale:
- In this example, the average execution time is 500 ms and the workload must handle 450 requests per second. This equates to a concurrency of 250 including a 10% buffer. Register the Lambda function as a scalable target in Application Auto Scaling with the RegisterScalableTarget API:
Next, verify that the Lambda function is registered correctly:
The output shows:
- Schedule the Provisioned Concurrency using the PutScheduledAction API of Application Auto Scaling:
Note: Set the schedule a few minutes ahead of the expected peak to allow time for Lambda to prepare the execution environments. The time must be specified in UTC.
- Verify that the scaling action is correctly scheduled:
The output shows:
- To stop the Provisioned Concurrency, schedule another action after the peak with the capacity set to 0:
With this configuration, Provisioned Concurrency starts at 11:45 and stops at 13:15. It uses a concurrency of 250 to handle the load during the peak and releases resources after. This also optimizes costs by limiting the use of this feature to 90 minutes per day.
In this architecture, the CreateOrder function synchronously calls ValidateProduct, ValidatePayment, and ValidateDelivery. Provisioning concurrency only for the CreateOrder function would be insufficient as the three other functions would impact it negatively. To be efficient, you must configure Provisioned Concurrency for all four functions, using the same process.
Observing performance when Provisioned Concurrency is scheduled
Run the following command at any time outside the scheduled window to confirm that nothing has been provisioned yet:
When run during the defined time slot, the output shows that the concurrency is allocated and ready to use:
Verify the different scaling operations using the following command:
You can see the scale-out and scale-in activities at the times specified, and how Lambda fulfills the requests:
In the following latency graph, most of the requests are now completed in less than 800 ms, in line with performance during the rest of the day. The traffic during the “deal of the day” at noon is comparable to any other time. This helps provide a consistent user experience, even under heavy load.
![Performance distribution](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2020/08/13/800ms-perf.png)
Performance distribution
Automate the scheduling of Lambda Provisioned Concurrency
You can use the AWS CLI to set up scheduled Provisioned Concurrency, which can be helpful for testing in a development environment. You can also define your infrastructure with code to automate deployments and avoid manual configuration that may lead to mistakes. This can be done with AWS CloudFormation, AWS SAM, AWS CDK, or third-party tools.
The following code shows how to schedule Provisioned Concurrency in an AWS SAM template:
In this template:
- As with the AWS CLI, you need an alias for the Lambda function. This automatically creates the alias “prod” and sets it to the latest version of the Lambda function.
- This creates an AWS::ApplicationAutoScaling::ScalableTarget resource to register the Lambda function as a scalable target.
- This references the correct version of the Lambda function by using the “prod” alias.
- Defines different actions to schedule as a property of the scalable target. You define the same properties as used in the CLI.
- You cannot define the scalable target until the alias of the function is published. The syntax is <FunctionResource>Alias<AliasName>.
Conclusion
Cold starts can impact the overall duration of your Lambda function, especially under heavy load. This can affect the end-user experience when the function is used as a backend for synchronous APIs.
Provisioned Concurrency helps reduce latency by creating execution environments ahead of invocations. Using the ecommerce platform example, I show how to combine this capability with Application Auto Scaling to schedule scaling-out and scaling-in. This combination helps to provide a consistent execution time even during special events that cause usage peaks. It also helps to optimize cost by limiting the time period.
To learn more about scheduled scaling, see the Application Auto Scaling documentation.