Implementing Serverless Tiering Strategies with AWS Lambda Reserved Concurrency

By Lal Verma, Sr. Partner Solutions Architect – AWS

Reserved concurrency is an important feature of AWS Lambda which guarantees the maximum number of instances, running concurrently, for a Lambda specific function.

In this post, I will explore how you can leverage this feature to define a tiering strategy for multi-tenant software-as-a-service (SaaS) applications, and walk through an example implementation.

There are many benefits when using AWS Lambda functions. Starting with the serverless model in SaaS (and in general) removes the undifferentiated heavy lifting by simplifying the architecture and operational footprint.

Note that if this is the first time you are encountering serverless SaaS on Amazon Web Services (AWS), please check out this blog post to help provide additional context: Building a multi-tenant SaaS solution using AWS serverless services.

With this, let’s focus on the problem area where reserved concurrency can help your SaaS applications.

Tiering Strategy in SaaS AWS

Let’s start with an example: SaaS provider A is hosting an e-commerce platform. One of the tenants, Tenant-M, sells mobile phones through this platform. Some of the models have become instantly popular, receiving high traffic but Tenant-M is experiencing failed orders.

The tenant has escalated the problem to the SaaS provider. When the issue is investigated, they find out that the compute capacity, which is shared across multiple tenants, is not sufficient to serve all the requests.

The SaaS provider also observes that Tenant-M needs significantly more capacity compared to the other tenants and is impacting the performance of other tenants as well. This is a common problem area with SaaS applications, also termed as a “noisy neighbor” situation.

Defining a tiering strategy helps in resolving this. For instance, the SaaS provider can define a platinum tier for the more demanding tenants like Tenant-M. It can charge such tenants a higher price in exchange of committing higher compute capacity, higher storage capacity or other resources (Figure 1 illustrates this pattern).

Figure 1 – Tiering strategy in SaaS.

So how does it work? If we limit the discussion to compute and the SaaS application is based on Amazon Elastic Compute Cloud (Amazon EC2), you can easily achieve this by creating dedicated EC2 instances per tenant (for the platinum tier) or creating a dedicated auto scaling group for the tenant.

If you are using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), you can create a dedicated cluster per tenant. With Lambda, the solution is different: Lambda is a serverless technology and does not give you any control to the underlying infrastructure.

Reserved concurrency provides the means to implement the tiering strategy in this case. With this, you can commit a dedicated compute capacity for your platinum tier tenants without having to manage the underlying resources.

Tiering Strategy with Reserved Concurrency

When a Lambda function is invoked, the Lambda service allocates a compute instance in which the function runs and processes the event. If the function is invoked again while the first Lambda function still runs, another instance is allocated, which increases the function’s concurrency.

You can find more details on Lambda function scaling in the documentation. Continuing with the e-commerce example, if you are expecting 1,000 orders to be processed at the same time, 1,000 instances of your order processing function will be instantiated.

All of your account’s functions in the same AWS Region without reserved concurrency share the pool of unreserved concurrency. Without reserved concurrency, other functions can use up all of the available concurrency. Reserved concurrency can be applied to a function to guarantee maximum number of concurrent function instances at any point of time.

Therefore, if you have an “Order Processing” Lambda function, and you are setting “reserved concurrency” as 100 for it, Lambda will ensure your 100 concurrent requests can always be served, irrespective of the activity of other functions. Defining a concurrency limit does not have a cost. You only pay for the actual requests that get processed.

You can extend this feature and create dedicated function definition for the platinum tenant and associate reserved concurrency with it. For example, if the function name is “order-processing-function”, you can have a separate definition for the Tenant-M as “Tenant-M-order-processing-function” with reserved concurrency.

You can set the reserved concurrency value based on the tenant needs and pricing. You can implement this for all the other functions as well. For every platinum tenant, you can create a dedicated compute with the help of this approach.

Figure 2 illustrates the implementation where reserved concurrency is applied to all the functions for Tenant PT1 and Tenant PT2.

Figure 2 – Tiering strategy with reserved concurrency.

While defining the tiering strategy, you must be aware of some facts to help you plan better. The first being the “unreserved concurrency limit” for a Lambda function which is set by default as 1000. This limit is per account and per region; it’s also a soft limit which can be increased to tens of thousands. You need to request the quota increase to ensure you have sufficient concurrency quota available to run all your functions, for all your tenants, at any point of time.

Another thing to consider is the “upper concurrency limit.” If the request load for any of the tenant is increasing beyond a limit, the requests will automatically be throttled. This can limit the executions if the tenant demands are more. If this is the case, you can either move your tenants to a higher tier or you can increase the limit for the allocated tier itself.

On the other side, this can provide an added advantage to your multi-tenant systems. Due to automated throttling of requests, the noisy neighbor situation will be completely in your control.

For more details on the reserved concurrency feature, you can check out the web reference “Managing Lambda reserved concurrency” which is available as part of the AWS Lambda developer guide. Now that you reviewed the basic approach with reserved concurrency, let’s see how the overall solution works.

Implementing Tiering with AWS Lambda Reserved Concurrency

In the example implementation, a dedicated compute space is created for the platinum tenants. The solution is implemented as a part of Serverless SaaS Reference Solution built by the AWS SaaS Factory team.

In order to set the right context, let’s start with an overview of the reference solution. It was built to provide SaaS developers and architects with working code, and it illustrates how to design and deliver a multi-tenant SaaS solution on AWS using serverless technologies such as AWS Lambda, Amazon API Gateway, Amazon Cognito, Amazon DynamoDB, and AWS CodePipeline.

The solution covers a broad range of multi-tenant considerations, including onboarding and identity, tenant and user management, authentication and authorization, data partitioning, tenant isolation, automated deployment, and multi-tenant observability.

Figure 3 – Serverless SaaS reference architecture

An end-to-end implementation of a tiering strategy, with the help of reserved concurrency, can be formulated in three steps:

Creating functions and APIs: This will ensure that Lambda function and API definitions (API Gateway) are created for the platinum tenants.
Implementing tenant specific routing: This will ensure the platinum tenant requests are routed to the tenant specific resources.
Implementing tenant specific policies: This will ensure the tenant specific functions are protected from unauthorized access.

Creating Functions and APIs

The reference solution implements sample order service and product service, as part of e-commerce SaaS solution. These services are based on Lambda functions. For instance, the Order Service is composed of Lambda functions such as OrderCreateFunction, OrderGetFunction, OrderDeleteFunction.

AWS CodePipeline enables the provisioning of these Lambda functions with the help of AWS CloudFormation templates. When a new platinum tenant is registered, the same pipeline is used to create the dedicated compute resources for the tenant.

In the reference solution, tenant provision is based on the CloudFormation template (more specifically AWS Serverless Application Model) “tenant-template.yaml.” The template creates or updates multiple AWS resources including Lambda functions, DynamoDB tables, and AWS Identity and Access Management roles.

Here is an excerpt from this template, representing the function definition for GetOrdersFunction:

GetOrdersFunction:
Type: AWS::Serverless::Function
DependsOn: OrderFunctionExecutionRole
Properties:
CodeUri: s3://serverless-saas-pipeline-artifactsbucket2aac5544-149q22ii0usir/07de032a239beb005fad025861857b0b
Handler: order_service.get_orders
Runtime: python3.8
Tracing: Active
Role:
Fn::GetAtt:
- OrderFunctionExecutionRole
- Arn
ReservedConcurrentExecutions:
Fn::If:
- IsPooledDeploy
- Ref: AWS::NoValue
- Ref: LambdaReserveConcurrency
Layers:
- Ref: ServerlessSaaSLayers
Environment:
Variables:
POWERTOOLS_SERVICE_NAME: OrderService
IS_POOLED_DEPLOY:
Fn::If:
- IsPooledDeploy
- true
- false
ORDER_TABLE_NAME:
Ref: OrderTable
AutoPublishAlias: live
DeploymentPreference:
Enabled:
Ref: LambdaCanaryDeploymentPreference
Type: Canary10Percent5Minutes
Alarms:
- Ref: GetOrdersFunctionCanaryErrorsAlarm
Tags:
TenantId:
Ref: TenantIdParameter
Metadata:
SamResourceId: GetOrdersFunction

As highlighted in the code above, ReservedConcurrentExecutions attribute of the function is taking care of providing the reserved concurrency to the function. The concurrency limit value is passed with the help of LambdaReserveConcurrency attribute, which is configurable.

A similar approach can be taken to define the reserved concurrency value for all the other lambda functions including GetOrderFunction, CreateOrderFunction, and UpdateOrderFunction. For the definitions of all the sample functions, you can check the complete template code, available at Github.

Please note that, in the above solution, reserved concurrency attribute is updated with the help of CloudFormation template, but you can achieve this using other means like AWS Management Console, AWS Command Line Interface (CLI), or AWS Software Developer Kits (SDKs).

Implementing Tenant-Specific Routing

Now that the functions are implemented, you need to ensure you are routing the platinum tenant users to their respective functions correctly.

The first thing you need do is create the API definitions. We are going to use Amazon API Gateway, which is a fully managed service that makes it easy to publish, maintain, monitor, and secure APIs at any scale.

In the reference solution, again the CloudFormation template “tenant-template.yaml” is used for the API definitions. Once the platinum tenant API definitions are created, CodePipeline will update the tenant-details, a DynamoDB table representing tenant metadata, with the respective API Gateway URL.

This URL can be referred at the client end to facilitate the tenant-specific routing. In the reference solution, sample user interface (UI) is a web interface based on Angular JS. A component called AuthConfigurationService (auth-configuration-service.ts) is responsible to set the tenant-specific configuration.

When the user logs in, this fetches the tenant metadata from the DynamoDB table, and sets the API Gateway URL as one of the “localstorage” attributes. This ensures that all the platinum user requests are routed to the right API and Lambda Functions. You can see the implementation details of this component at GitHub.

Implementing Tenant-Specific Policies

Tenant specific routing is not enough to ensure a complete tenant isolation. You also need to ensure that a security policy is in place, so that a user from Tenant A cannot access the Lambda functions belonging to Tenant B.

Figure 4 – User authorization based on IAM policy

In the reference solution, this check is implemented with the help of Amazon Cognito, IAM, and API Gateway which can validate all the API requests for authorization. It does this with the help of custom authorizer, which returns the policy with API reference(s) specific to a tenant. Please refer to the custom authorizer code for the implementation details.

Conclusion

In serverless SaaS applications, AWS lambda provides elastic compute resources without you worrying about the operational and scaling aspects of the underlying infrastructure, while not limiting you to define a dedicated compute resource or your tenants.

Reserved concurrency provides a robust and clean approach to segregate compute resources and define the tiering strategy.

While enabling reserved concurrency, you must ensure, you have sufficient concurrency quota allocated to serve your application needs. Also, as the reserved concurrency comes with an upper limit, you should decide its value based on the tenant needs and your pricing strategy.

In this post, I covered the sample implementation in brief. There are multiple steps involved and each of the step is difficult to cover in entirety. In order to dive deeper, you can refer the complete solution, available on GitHub. This solution is built by AWS SaaS Factory team.

About AWS SaaS Factory

AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Visit the AWS SaaS Factory Insights Hub to discover more technical and business content and best practices.

SaaS builders are encouraged to reach out to their account representative to inquire about engagement models and to work with the AWS SaaS Factory team.