AWS Cloud service considerations when modernizing account-per-tenant solutions
An increasing number of software as a service (SaaS) providers are modernizing their architectures to utilize resources more efficiently and reduce operational costs. There are multiple strategies that can be used when refining your multi-tenant architecture. This blog will look at a specific scenario where SaaS providers move from an account-per-tenant to an Amazon Elastic Kubernetes Service (Amazon EKS) environment, taking advantage of some of Amazon EKS constructs to achieve better cost efficiencies and scaling strategies that align with multi-tenant workloads.
Siloed accounts vs siloed Kubernetes namespaces
In SaaS environments, there are multiple strategies that can be used to deploy tenants. Some of these environments share infrastructure and some do not. We refer to these models as pooled (shared) and siloed (dedicated). In this post, we examine two variations of the siloed model.
Let’s consider a SaaS product that needs to support many customers, each with their own independent application, such as a web application. Using a siloed account-per-tenant model (Figure 1), a SaaS provider will utilize a dedicated AWS account to host each tenant’s workloads.
The account-per-tenant model makes each account the unit of scale and isolation. Now, let’s consider what would be required to transition this environment to a Siloed Namespace-Per-Tenant model where a SaaS provider could use containerization to package each website and a container orchestrator to deploy the websites across shared compute nodes (EC2 instances). Kubernetes can be employed as a container orchestrator, and a website would then be represented by a Kubernetes deployment and its associated pods. A Kubernetes namespace would serve as the logical encapsulation of the tenant-specific resources, as each tenant would be mapped to one Kubernetes namespace. The Kubernetes HorizontalPodAutoscaler can be utilized for autoscaling purposes, dynamically adjusting the number of replicas in the deployment on a given namespace based on workload demands.
When additional compute resources are required, tools such as the Cluster Autoscaler or Karpenter can dynamically add more EC2 instances to the shared Kubernetes Cluster. An Application Load Balancer can be reused by multiple tenants to route traffic to the appropriate pods. For Amazon RDS, SaaS providers can use tenant-specific database schemas to separate tenant data. For static data, Amazon Elastic File System (Amazon EFS) and tenant-specific directories can be employed. The SaaS provider would still have a control plane AWS account that interacts with the Kubernetes and AWS APIs to create and update tenant-specific resources.
This transition to Kubernetes using Amazon EKS and other managed services offers numerous advantages. It enables efficient resource utilization by leveraging the Amazon EKS scaling model to reduce costs and better align tenant consumption with tenant activity (Figure 2).
Amazon EKS cluster sizing and customer segmentation considerations in multi-tenancy designs
A high concentration of SaaS tenants hosted within the same system results in a large “blast radius.” This means a failure within the system has the potential to impact all resident tenants. This situation can lead to downtime for multiple tenants at once. To address this problem, SaaS providers should consider partitioning their customers amongst multiple AWS accounts or EKS clusters, each with their own deployments of this multi-tenant architecture. The number of tenants that can be present in a single cluster is a determination that can only be made by the SaaS provider after profiling the consumption activity of your tenants. Compare the shared risks of a subset of customers with the efficiency benefits of shared consumption of resources.
Amazon EKS security
SaaS providers should evaluate whether it’s appropriate for them to make use of containers as a Workload Isolation Boundary. This is of particular importance in multi-tenant Kubernetes architectures, given that containers running on a single EC2 instance share the underlying Linux kernel. Security vulnerabilities place this shared resource (the EC2 instance) at risk from attack vectors from the host Linux instance. Risk is elevated when any container running in a Kubernetes Pod cluster initiates untrusted code. This risk is heightened if SaaS providers permit tenants to “bring their code”.
Kubernetes is a single-tenant orchestrator, but with a multi-tenant approach to SaaS architectures a single instance of the Amazon EKS control plane will be shared among all the workloads running within a cluster. Amazon EKS considers the cluster as the hard isolation security boundary. Every Amazon EKS managed Kubernetes cluster is isolated in a dedicated single-tenant Amazon Virtual Private Cloud. At present, hard multi-tenancy can only be implemented by provisioning a unique cluster for each tenant.
Consider how AWS Fargate could be used to address security needs. Also, explore how you can use Amazon EKS constructs to achieve tenant isolation. This includes applying policies to limit cross namespace access and associate IAM roles for services accounts with my namespaces to scope access to other tenant infrastructure.
Amazon EFS considerations
A SaaS provider may consider Amazon EFS as the storage solution for the static content of the multiple tenants. This provides them with a straightforward, serverless, and elastic file system. Directories may be used to separate the content for each tenant.
While this approach of creating tenant-specific directories in Amazon EFS provides many benefits, there may be challenges harvesting per-tenant utilization and performance metrics. This can result in operational challenges for providers that need to granularly meter per-tenant usage of resources. Consequently, noisy neighbors will be difficult to identify and remediate. To resolve this, SaaS providers should consider building a custom solution to monitor the individual tenants in the multi-tenant file system by leveraging storage and throughput/IOPS metrics.
Amazon RDS considerations
Multi-tenant workloads, where data for multiple customers or end users is consolidated in the same Amazon RDS database cluster, can present operational challenges regarding per-tenant observability. Both MySQL Community Edition and open-source PostgreSQL have limited ability to provide per-tenant observability and resource governance. AWS customers operating multi-tenant workloads often use a combination of ‘database’ or ‘schema’ and ‘database user’ accounts as substitutes. AWS customers should use alternate mechanisms to establish a mapping between a tenant and these substitutes. This will give you the ability to process raw observability data from the database engine externally. You can then map these substitutes back to tenants, and distinguish tenants in the observability data.
In this blog, we’ve shown what to consider when moving to a multi-tenancy SaaS solution in the AWS Cloud, how to optimize your cloud-based SaaS design, and some challenges and remediations. Invest effort early in your SaaS design strategy to explore your customer requirements for tenancy. Work backwards from your SaaS tenants end goals to determine: the level of computing performance and cyber security features required, and how the SaaS provider monitors and operates the platform with the target tenancy configuration. Your respective AWS account team is highly qualified to advise on these design decisions. Take advantage of reviewing and improving your design using the AWS Well-Architected Framework, specifically the SaaS Lens. The tenancy design process should be followed by extensive prototyping to validate functionality before production rollout.