AWS Compute Blog
Understanding techniques to reduce AWS Lambda costs in serverless applications
This post is written by Josh Kahn, Tech Leader, and Chloe Jeon, Senior PMTes Lambda.
Serverless applications can lower the total cost of ownership (TCO) when compared to a server-based cloud execution model because it effectively shifts operational responsibilities such as managing servers to a cloud provider. Deloitte research on serverless TCO with Fortune 100 clients across industries show that serverless applications can offer up to 57% cost savings compared with server-based solutions.
Serverless applications can offer lower costs in:
- Initial development: Serverless enables builders to be more agile, deliver features more rapidly, and focus on differentiated business logic.
- Ongoing maintenance and infrastructure: Serverless shifts operational burden to AWS. Ongoing maintenance tasks, including patching and operating system updates.
This post focuses on options available to reduce direct AWS costs when building serverless applications. AWS Lambda is often the compute layer in these workloads and may comprise a meaningful portion of the overall cost.
To help optimize your Lambda-related costs, this post discusses some of the most commonly used cost optimization techniques with an emphasis on configuration changes over code updates. This post is intended for architects and developers new to building with serverless.
Building with serverless makes both experimentation and iterative improvement easier. All of the techniques described here can be applied before application development, or after you have deployed your application to production. The techniques are roughly by applicability: The first can apply to any workload; the last applies to a smaller number of workloads.
Right-sizing your Lambda functions
Lambda uses a pay-per-use cost model that is driven by three metrics:
- Memory configuration: allocated memory from 128MB to 10,240MB. CPU and other resources available to the function are allocated proportionally to memory.
- Function duration: time function runs, measures in milliseconds.
- Number of invocations: the number of times your function runs.
Over-provisioning memory is one of the primary drivers of increased Lambda cost. This is particularly acute among builders new to Lambda who are used to provisioning hosts running multiple processes. Lambda scales such that each execution environment of a function only handles one request at a time. Each execution environment has access to fixed resources (memory, CPU, storage) to complete work on the request.
By right-sizing the memory configuration, the function has the resources to complete its work and you are paying for only the needed resources. While you also have direct control of function duration, this is a less effective cost optimization to implement. The engineering costs to create a few milliseconds of savings may outweigh the cost savings. Depending on the workload, the number of times your function is invoked may be outside your control. The next section discusses a technique to reduce the number of invocations for some types of workloads.
Memory configuration is accessible via the AWS Management Console or your favorite infrastructure as code (IaC) option. The memory configuration setting defines allocated memory, not memory used by your function. Right-sizing memory is an adjustment that can reduce the cost (or increase performance) of your function. However, lowering the function-memory may not always result in cost savings. Lowering function memory means lowering available CPU for the Lambda function, which could increase the function duration, resulting in either no cost savings or higher cost. It is important to identify the optimal memory configuration for cost savings while preserving performance.
AWS offers two approaches to right-sizing memory allocation: AWS Lambda Power Tuning and AWS Compute Optimizer.
AWS Lambda Power Tuning is an open-source tool that can be used to empirically find the optimal memory configuration for your function by trading off cost against execution time. The tool runs multiple concurrent versions of your function against mock input data at different memory allocations. The result is a chart that can help you find the “sweet spot” between cost and duration/performance. Depending on the workload, you may prioritize one over the other. AWS Lambda Power Tuning is a good choice for new functions and can also help select between the two instruction set architectures offered by Lambda.
AWS Compute Optimizer uses machine learning to recommend an optimal memory configuration based on historical data. Compute Optimizer requires that your function be invoked at least 50 times over the trailing 14 days to provide a recommendation based on past utilization, so is most effective once your function is in production.
Both Lambda Power Tuning and Compute Optimizer help derive the right-sized memory allocation for your function. Use this value to update the configuration of your function using the AWS Management Console or IaC.
This post includes AWS Serverless Application Model (AWS SAM) sample code throughout to demonstrate how to implement optimizations. You can also use AWS Cloud Development Kit (AWS CDK), Terraform, Serverless Framework, and other IaC tools to implement the same changes.
MyFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: nodejs18.x
Handler: app.handler
MemorySize: 1024 # Set memory configuration to optimize for cost or performance
Setting a realistic function timeout
Lambda functions are configured with a maximum time that each invocation can run, up to 15 minutes. Setting an appropriate timeout can be beneficial in containing costs of your Lambda-based application. Unhandled exceptions, blocking actions (for example, opening a network connection), slow dependencies, and other conditions can lead to longer-running functions or functions that run until the configured timeout. Proper timeouts are the best protection against both slow and erroneous code. At some point, the work the function is performing and the per-millisecond cost of that work is wasted.
Our recommendation is to set a timeout of less than 29-seconds for all synchronous invocations, or those in which the caller is waiting for a response. Longer timeouts are appropriate for asynchronous invocations, but consider timeouts longer than 1-minute to be an exception that requires review and testing.
Using Graviton
Lambda offers two instruction set architectures in most AWS Regions: x86 and arm64.
Choosing Graviton can save money in two ways. First, your functions may run more efficiently due to the Graviton2 architecture. Second, you may pay less for the time that they run. Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost. Consider starting with Graviton when developing new Lambda functions, particularly those that do not require natively compiled binaries.
If your function relies on native compiled binaries or is packaged as a container image, you must rebuild to move between arm64 and x86. Lambda layers may also include dependencies targeted for one architecture or the other. We encourage you to review dependencies and test your function before changing the architecture. The AWS Lambda Power Tuning tool also allows you to compare the price and performance of arm64 and x86 at different memory settings.
You can modify the architecture configuration of your function in the console or your IaC of choice. For example, in AWS SAM:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Architectures:
- arm64 # Set architecture to use Graviton2
Runtime: nodejs18.x
Handler: app.handler
Filtering incoming events
Lambda is integrated with over 200 event sources, including Amazon SQS, Amazon Kinesis Data Streams, Amazon DynamoDB Streams, Amazon Managed Streaming for Apache Kafka, and Amazon MQ. The Lambda service integrates with these event sources to retrieve messages and invokes your function as needed to process those messages.
When working with one of these event sources, builders can configure filters to limit the events sent to your function. This technique can greatly reduce the number of times your function is invoked depending on the number of events and specificity of your filters. When not using event filtering, the function must be invoked to first determine if an event should be processed before performing the actual work. Event filtering alleviates the need to perform this upfront check while reducing the number of invocations.
For example, you may only want a function to run when orders of over $200 are found in a message on a Kinesis data stream. You can configure an event filtering pattern using the console or IaC in a manner similar to memory configuration.
To implement the Kinesis stream filter using AWS SAM:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: nodejs18.x
Handler: app.handler
Events:
Type: Kinesis
Properties:
StartingPosition: LATEST
Stream: "arn:aws:kinesis:us-east-1:0123456789012:stream/orders"
FilterCriteria:
Filters:
- Pattern: '{ "data" : { "order" : { "value" : [{ "numeric": [">", 200] }] } } }'
If an event satisfies one of the event filters associated with the event source, Lambda sends the event to your function for processing. Otherwise, the event is discarded as processed successfully without invoking the function.
If you are building or running a Lambda function that is invoked by one of the previously mentioned event sources, it’s recommended that you review the filtering options available. This technique requires no code changes to your Lambda function – even if the function performs some preprocessing check, that check still completes successfully with filtering implemented.
To learn more, read Filtering event sources for AWS Lambda functions.
Avoiding recursion
You may be familiar with the programming concept of a recursive function or a function/routine that calls itself. Though rare, customers sometimes unintentionally build recursion in their architecture, so a Lambda function continuously calls itself.
The most common recursive pattern is between Lambda and Amazon S3. Actions in an S3 bucket can trigger a Lambda function, and recursion can occur when that Lambda function writes back to the same bucket.
Consider a use case in which a Lambda function is used to generate a thumbnail of user-submitted images. You configure the bucket to trigger the thumbnail generation function when a new object is put in the bucket. What happens if the Lambda function writes the thumbnail to the same bucket? The process starts anew and the Lambda function then runs on the thumbnail image itself. This is recursion and can lead to an infinite loop condition.
While there are multiple ways to prevent against this condition, it’s best practice to use a second S3 bucket to store thumbnails. This approach minimizes changes to the architecture as you do not need to change the notification settings nor the primary S3 bucket. To learn about other approaches, read Avoiding recursive invocation with Amazon S3 and AWS Lambda.
If you do encounter recursion in your architecture, set Lambda reserved concurrency to zero to stop the function from running. Allow minutes to hours before lifting the reserved concurrency cap. Since S3 events are asynchronous invocations that have automatic retries, you may continue to see recursion until you resolve the issue or all events have expired.
Conclusion
Lambda offers a number of techniques that you can use to minimize infrastructure costs whether you are just getting started with Lambda or have numerous functions already deployed in production. When combined with the lower costs of initial development and ongoing maintenance, serverless can offer a low total cost of ownership. Get hands-on with these techniques and more with the Serverless Optimization Workshop.
To learn more about serverless architectures, find reusable patterns, and keep up-to-date, visit Serverless Land.