Multi-tenant design considerations for Amazon EKS clusters

This post was contributed by Roberto Migli, AWS Solutions Architect.

Amazon Elastic Kubernetes Service (Amazon EKS) is used today by thousands of customers to run container applications at scale. One of the common questions that often we hear is: how do we provide a multi-tenant Amazon EKS cluster to our teams? Should I run one cluster, or many clusters? Should I use one cluster per team, per environment, per account? There is no right or wrong answer here. In this post, we will go through some aspects to help you make the right decision.

The problem

Multi-tenancy requires that different workloads or teams can share the same cluster with some level of logical or physical isolation between them. This can be required for different reasons. Security, for example, so that each team can operate only on its intended workloads and cannot operate on other teams’ workloads, or to have network isolation between application (by default, all pods, in all namespaces, can communicate to each other). Another reason is to provide fair shares of resources (CPU, memory, network…) across different workloads sharing the same infrastructure. Software as a service companies that deploy a solution per customer, can increase utilization of infrastructure by running multiple tenants on the same cluster, but will need to provide an even higher degree of isolation between each of the tenant.

Kubernetes native solutions for multi-tenancy

Kubernetes provides some native Kubernetes APIs and constructs to help design for multi-tenancy within one cluster. We will list below the main constructs for Compute, Networking, and Storage.

Compute isolation

Kubernetes documentation defines Namespaces as “a way to divide cluster resources between multiple users” – and thus are foundational for multi-tenancy. Most of Kubernetes objects belong to a namespace. In the drawing below we have two namespaces, each running a set of objects that are virtually isolated to each other.

While a Namespace does not provide workload or user isolation, it is central to understand the next components. The first one is Role-based access control (RBAC): it provides a way to define who can do what on the Kubernetes API. The authorization can be applied to the cluster via a ClusterRole, or it can be bounded to one Namespace via a Role. For example, we can define a role called namespace1-admin to be used to provide administrator access to one namespace called namespace1, and associate it with a group named admin-ns1 with the following code:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: namespace1-admin
  namespace: namespace1
rules:
- apiGroups: ["", "extensions", "apps"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["batch"]
  resources:
  - jobs
  - cronjobs
  verbs: ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: admins-ns1-rb
  namespace: namespace1
subjects:
- kind: Group
  name: admins-ns1
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: namespace1-admin
  apiGroup: rbac.authorization.k8s.io

Amazon EKS provides an integration of RBAC with IAM via the AWS IAM Authenticator for Kubernetes, allowing to map IAM users and roles to RBAC groups. RBAC is a central component that can be used also to provide control over the other layers of isolation. We will talk about it below.

Kubernetes allows users to define requests and limits for CPU and Memory for Pods. To prevent contention of resources on a node and improve intelligent resource allocation by the scheduler, developers should always set them. Resource Quotas allow users to limit the amount of resources or Kubernetes objects that can be consumed within one namespace. From a compute perspective, Resource Quotas allow users to define limits on CPU and memory natively, for example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
  namespace: namespace1
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi

With Resource Quotas, workloads can be assigned a limited range of resources to prevent tenants from interfering with each other. It is possible to use Limit Ranges, which allow users to define a default, minimum and maximum request and limit per Pod or even container.

Tenants should be isolated also from access to the underlying node instance. Kubernetes Pod Security Policy (PSP) allows users to do so. There are many features that can be controlled with PSPs, but teams building multi-tenant clusters should look at Privileged, that determines if any container in a pod can enable privileged mode, and HostNetwork to prevent Pods from accessing the node connectivity and potentially snoop network activity of neighbours. Amazon EKS supports PSPs since v1.13, more information is available on the AWS Open Source Blog.

Kubernetes has also solutions to specify where the Pods can be scheduled, relatively to other pods or nodes. Pod Anti-Affinity allows, for example, users to define that Pods with specific labels cannot be scheduled on the same node, using a configuration like the following:

apiVersion: v1
kind: Pod
metadata:
  namespace: namespace1
  name: workload-1
  labels:
    team: "team-1"
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: "kubernetes.io/hostname"
        namespaces:
          - shared-services
        labelSelector:
          matchExpressions:
          - key: "type"
            operator: In
            values: ["monitoring"]

Here, the pods in namespace1 with labels team: team-1 will not be scheduled on the same nodes, where other pods with label type: monitoring in namespace shared-services are already scheduled, to prevent interfering between business applications and monitoring applications. Note a few things here. First of all, in order to work this feature requires that appropriate labels are applied to workloads. Second, this configuration might be difficult to maintain at scale, because teams might be ending up having no nodes available in order to respect the hard requirements. Finally, using Inter-pod affinity and anti-affinity requires high workload on the control plane and is not recommended in clusters with several hundreds of nodes and above. In a similar way, it is possible to allocate Pod to specific nodes with the nodeSelectors and nodeAffinity.

What if we want to instead have nodes that are normally excluded for scheduling pods, unless specified differently? That’s where Taints and Tolerations come in play. Using the example above, the cluster administrator might prefer to create a dedicate Worker Group on EKS that should be used for monitoring and administrative workloads, using a Node Group Config file such as the following:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata: 
  name: dev-cluster
  region: eu-west-1
nodeGroups:
  - name: ng-monitoring
    instanceType: c5.8xlarge
    desiredCapacity: 2
    taints:
      monitoring: "true:NoSchedule"

With this taint, nodes that do not have a toleration for this taint, will not be scheduled on this Node Group.

Another possibility is to use AWS Fargate on EKS. Fargate runs each pod in a VM-isolated environment without sharing resources with other pods, and eliminates the need to create or manage EC2 instances as nodes.

Networking isolation

By default, pods can communicate over the network on the same cluster across different namespaces. Kubernetes provides Network Policies that allow user to define fine-grained control over the pod-to-pod communication. The actual implementation of the network policy is delegated to a network plugin. In a default EKS cluster, the pod-to-pod networking is delegated to the Amazon VPC CNI plugin, which supports Calico to enforce Kubernetes Network Policies. Assuming that we want to isolate tenants by namespace, we could use the following Network Policy to allow network communication only within the namespace namespace1 (that is, communication from this namespace to other namespaces on the same cluster and vice versa are denied):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-np-ns1
  namespace: namespace1
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          nsname: namespace1
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          nsname: namespace1

There are more advanced configurations of Network Policies that can be used to limit communication on a multi-tenant cluster, and the Amazon EKS documentation provides a quick-start and a primer on how to set up your environment.

Service meshes can allow also to define an additional model of protection that can even span outside of the single EKS cluster. Istio is a very popular open-source service mesh, that provides features as traffic management, security and observability. The Istio team discussed in a blog post with great detail on how to deploy Istio in a multi-tenant cluster. The EKS Workshop has also a quickstart on how to set up Istio on EKS. Among the different features, Istio allows users to define Authentication and Authorization policies that can work at different network layers. For example, you can enable better control on how services from different tenant can communicate (or not) using a combination of Network Policies that control the network flow at L3 and L4 and token-based authentication at L7 with JWT.

AWS App Mesh is a managed service mesh that gives you consistent visibility and network traffic controls for every service in an application. AWS App Mesh is based on Envoy and provides a fully managed experience of the control plane of the service mesh. To test it out, the documentation provides a quickstart for Amazon EKS.

Storage isolation

Tenants using a shared cluster might need different types of storage types. Kubernetes offers a different set of tools to manage storage; the main one is a Volume, which provides a way to connect a form of persistent storage to a Pod and manage its lifecycle. Here we will not discuss volumes mounted directly from the node, as covered on the Compute section above (you probably want to disable the local volume access all together in a multi-tenant design with PSPs). Instead we will cover some key features of the PersistentVolume (PV) subsystem and its relation with a PersistentVolumeClaim (PVC).

A PV is declared at cluster level, and so is a StorageClass: a cluster administrator will in fact define centrally the possible drivers, their configuration and operations. With Amazon EKS, different Storage Classes are provided out-of-the box, including Amazon EBS (both as in-tree plugin and as CSI plugin), Amazon EFS, and FSx for Lustre. A PVC allows users to request a Volume for a Pod as defined in a Storage Class. The PVC is defined as a Namespaced resource – and thus can provide a way to control tenancy access to storage. Administrator can use Storage Resource Quotas to define storage classes that belong to specific Namespaces. For example, to disable the usage of the storage class storagens2 from namespace namespace1 you could use a ResourceQuota such as:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-ns1
  namespace: namespace1
spec:
  hard:
    storagens2.storageclass.storage.k8s.io/requests.storage: 0

Isolation through multiple clusters

A possible alternative is to use multiple single tenant Amazon EKS clusters. With this strategy, each tenant will have the possibility to use its own Kubernetes cluster, within a shared AWS account or using dedicated accounts within an Organization for large enterprises.

Clusters can be either self-provisioned, or provided by a central team with standardized configurations already deployed. In the latter, there are two phases that are important for the central team: provisioning a cluster, and monitoring multiple clusters.

While Amazon EKS fully manages the control plane, customers will want to provide a standard set of configurations on a newly created cluster. Terraform can be used in such situations with the Kubernetes Provider, which can deploy PSP or Network Policies as mentioned above in an automated fashion. EKS has a Service Quota of 100 EKS clusters per account per region, that can be changed on request.

Once clusters are deployed, we might want to have an overview of all deployed clusters to monitor each tenant, make sure we are running the latest version of EKS control plane and operate at scale. Rancher is a popular open-source tool used to manage multiple Kubernetes clusters, make sure to check out this article on the Open Source blog for details on how to deploy and use it.

Conclusion

In this post, we covered some considerations to achieve multi-tenant designs for Amazon EKS on AWS from a compute, networking, and storage perspective. It is important to note that the adoption of the mentioned strategies should be weighted against cost and complexity of implementation. That’s why the approach of using a single Amazon EKS cluster per tenant is compelling, but requires capabilities to deploy and manage multiple clusters. Teams should also consider to use Amazon ECS to rapidly provision per tenant cluster. Finally, AWS provides help to partners and ISVs building multi-tenant applications through the AWS SaaS Factory.

Containers