AWS Compute Blog
Operating Lambda: Performance optimization – Part 3
In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part series discusses performance optimization for Lambda-based applications.
Part 1 describes the Lambda execution environment lifecycle, and explains defining, measuring, and improving cold starts. Part 2 explains the effect of the memory configuration on Lambda performance, and how to optimize static initialization code. This blog explains function architecture and best practices that can help improve the performance of your Lambda functions.
AWS makes regular improvements to the underlying components of the Lambda services, including the networking and compute layers. Lambda developers automatically benefit from these improvements without having to make changes to their functions.
To optimize your functions, it’s best practice to focus on the parts of the Lambda execution lifecycle where developers can have the most impact. The initialization code outside of the handler, and the handler code itself, are both important areas for customer-focused optimization. Combined with the ongoing improvements made by the Lambda service, focusing on these areas is the recommended way to optimize overall performance.
Comparing the performance of interactive and asynchronous workloads
A distributed systems application consists of multiple services communicating using messages over a network. Due to network delays, traffic, message retries, system failovers, and individual performance profiles of the services, the time taken to complete a unit of work can vary.
Instead of measuring the performance against an average number, it can be more helpful to measure the outliers. AWS X-Ray reports show a response distribution histogram that helps to identify the performance of outliers. Using percentile metrics, you can identify the latency experienced at the p95 or p99 range, for example. This shows performance for the slowest 5% or 1% of the requests, respectively.
Performance goals are driven by use-case. In interactive workloads, invocations are triggered directly by an end-user event. For applications such as web apps or mobile apps, the round-trip performance of requests is directly experienced by an end user. If a user makes an API request to the application’s backend, this synchronously calls downstream services before returning a response. For these types of applications, round-trip latency is important to optimize to improve the user experience.
In many interactive, synchronous workloads, you may be able to rearchitect the design to use a reactive asynchronous approach. In this case, the initial API call persists the request in an Amazon SQS queue and immediately responds with an acknowledgement to the caller.
If you are using Amazon API Gateway, this can be completed by using a service integration instead of a Lambda function. The work continues asynchronously and the caller either polls for progress or the application uses a webhook or WebSocket to communicate the status of the request. This approach can improve the end user experience while also helping to provide greater scale and resiliency to the workload.
To learn more, read “Managing backend requests and frontend notifications in serverless web apps”.
For many asynchronous workloads, the cold start latency of an individual Lambda function is less significant than the overall performance in aggregate. When working with event sources such as Amazon S3 or SQS, Lambda scales up to process the traffic. Since many tasks can be processed in parallel, if a small percentage of invocations experience cold start latency, it has a negligible impact on the overall time of the processing task.
When not to use a Lambda function
It’s not always necessary to use a Lambda function. In some situations, you may have other alternatives that can improve performance.
For functions that act as orchestrators, calling other services and functions and coordinating work, this can result in idle time in the function. The function typically waits while other tasks are performed, increasing cost. In most cases, moving the orchestration flow to AWS Step Functions creates a more maintainable and resilient state machine, and can help reduce cost.
Lambda functions that transport data from one service to another without performing any business logic on that data can often be replaced with service integrations. For example, it’s usually not necessary to put a Lambda function between API Gateway and Amazon DynamoDB to read an item from a table. This can be achieved by using VTL in an API Gateway service integration directly with DynamoDB. This can help improve scalability and decrease cost.
If a microservice sends all events to a Lambda function for filtering, and only operates on a small subset of events, you may be able to implement the filtering before invoking the function. For example:
- S3 events can be filtered on prefix and suffix patterns, enabling you to filter on only certain object keys.
- Amazon SNS can filter messages before invoking targets.
- With Amazon EventBridge, you can use powerful content filtering logic within the rules to be highly selective in which events trigger your functions.
Using filters can reduce cost and improve efficiency because you are only invoking Lambda functions for events that your application cares about.
Cost optimization
The cost of running Lambda for your workload is determined by three factors: the number of executions, the duration and memory usage (combined as Gb/s), and data transfer. Beyond the impact of memory allocation discussed in the previous section, there are other design choices that impact these three variables and therefore can reduce cost.
Your choice of runtime can impact the cost. Generally, compiled languages run code more quickly than interpreted languages but can take longer to initialize. For small functions with basic functionality, often an interpreted language is better suited for the fastest total execution time, and therefore the lowest cost. Functions using compiled languages are often faster in workloads with heavier computational complexity or where you are using Provisioned Concurrency, so the initialization overhead occurs before the invocation.
Invocation frequency is a major factor in determining cost. Depending upon which events are triggering Lambda functions, there are various controls you can use to lower the total number of invocations. For Lambda functions triggered by:
- API Gateway: use CloudFront caching in front of API Gateway for calls that return data that doesn’t frequently change. This increases the CloudFront cost but reduces the API Gateway and Lambda costs.
- SQS: the BatchSize property determines the number of items in an SQS queue sent to Lambda per invocation. Increasing this number reduces the number of Lambda invocations. Depending upon your use case, it may be possible to aggregate more data per message sent to SQS to process more data in fewer invocations.
Also, consider the overall data transfer costs for your workload. Data transfer costs for Lambda are charged at the Amazon EC2 data transfer rates listed at https://aws.amazon.com/ec2/pricing/on-demand/ in the Data transfer section. Data transferred in from the internet is not subject to any transfer costs. Generally, you can minimize these costs by limiting how much data is passed in messages between microservices.
Conclusion
This post is the final part in a 3-part series on performance optimization in Lambda. The Lambda service makes frequent performance improvements in the underlying hardware, software, and architecture of the service. This post identifies the parts of the Lambda lifecycle where developers can make the most impact on performance.
I compare interactive and asynchronous workloads, and when you can use a direct service integration instead of a Lambda function. I also show some cost optimization tips that can also help reduce the cost of running of your workload.
For more serverless learning resources, visit Serverless Land.