How to leverage Application Load Balancer’s advanced request routing to route application traffic across multiple Amazon EKS clusters

Introduction

The AWS Load Balancer Controller is a Kubernetes Special Interest Group (SIG) project, which enables organizations reduce their Kubernetes compute costs and the complexity of their application routing configuration. As you deploy workloads on Amazon Elastic Kubernetes Service (Amazon EKS), the controller simplifies exposing those applications by automating the provisioning and management and configuration of Amazon Elastic Load Balancers (ELB). It accomplishes this by deploying a Network Load Balancer (NLB) when you create a Kubernetes Service resource or by deploying an Application Load Balancer (ALB) when you create a Kubernetes Ingress resource.

Our customers have multiple Amazon EKS clusters due to requirements related to resiliency and security. In this post, we will talk about an interesting use case where an application, which is owned by multiple teams, is comprised of tens of microservices behind the same domain name, and spans across multiple Amazon Elastic Kubernetes Service (Amazon EKS) clusters. The desire is when the request has a specific user-id in the cookie, query string, or comes from a specific source IP then it is forwarded to the respective cluster.

Challenges

When implementing AWS Load Balancer Controller per Amazon EKS cluster, there is a one-to-one relation.
- When you provision a Service or Ingress resource in each cluster, that means there will be an individual NLB or ALB deployed per cluster. This introduces complexity in management and maintenance of those ELBs and also incurs additional cost.
- An ALB will have at least one IP configured per availability zone (AZ). To ensure that your ALB can scale properly you must have at least eight available IP addresses per subnet. In case you do not plan your VPC subnet blocks large enough then using too many ALBs could contribute to IP address exhaustion in your VPC.
- When you use a single domain name for your application (as in our use-case) and need to forward requests to various microservices in different clusters, you will need an additional load balancing layer to define and implement your HTTP routing rules to route the requests to the respective Amazon EKS cluster’s NLB or ALB.
Some of our customers prefer to leverage their existing infrastructure ELBs and keep the Kubernetes platform workflows and infrastructure operations independent from each other.
- AWS Load Balancer Controller by default owns the management and maintenance of ELB(s). This means that ELB(s) and the Kubernetes Service or Ingress resources share the same lifecycle.

Solution overview

The solution uses an existing infrastructure ALB to route traffic to distinct microservices of a single application running across multiple Amazon EKS clusters. We leverage Target Group which enables you to route requests to individual registered targets, such as EC2 instances or Kubernetes Pods, using the protocol and port number that you specify. An ELB supports multiple target groups. We use a Kubernetes custom resource (CR), introduced by AWS Load Balancer Controller, called TargetGroupBinding. It enables you to expose microservices by leveraging an existing ALB Target Group or NLB Target Group.

Solution overview and traffic flow using existing ALB across multiple EKS clusters.

Figure 1: Solution overview and traffic flow

As shown in Figure 1, we use the advanced request routing capability of the ALB. It allows the inspection of the header of an incoming client request and, based on the unique cookie, routes traffic to a particular microservice on a particular EKS cluster. It also supports request routing based on HTTP headers, methods, query parameters, and source IP. Refer to this post for more details.

AWS Load Balancer Controller is deployed on each Amazon EKS cluster and it watches the Kubernetes API server to keep track of network endpoints (Pods) for the Kubernetes Service(s).

TargetGroupBinding CR binds a Kubernetes Service to a load balancer target group. Whenever the list of endpoints changes, the load balancer controller updates the targets in the load balancer target group. This implementation approach provides an abstraction and decouples the infrastructure load balancers and native Kubernetes resources. By doing so, it also segregates the lifecycle and operations of the load balancer from the Kubernetes service or ingress. TargetGroupBinding supports target groups of either instance or ip target type. If target type is not explicitly specified, a mutating webhook will automatically call AWS API to find the target type for your target group and set it to correct value.

Let’ s explain each step with the traffic flow from the above diagram:

Step 1: Client 1 initiates an HTTP request to an existing ALB’s Fully Qualified Domain Name (FQDN). During this request, the client 1 includes a cookie with a key named user and the corresponding value set to user1.
Step 2: The ALB is set up with advanced request routing, specifically using header-based routing. This configuration involves inspecting a specific cookie in incoming requests. When the ALB identifies that the cookie user is set to user1 in a request, it follows a routing rule that directs the incoming traffic to Target Group 1.
Step 3: Target Group 1 consists of the Pods running in Amazon EKS Cluster 1. It serves as the destination for routing traffic directed at Service 1 running in the same Amazon EKS Cluster 1.
Step 4: Client 2 initiates an HTTP request to an existing ALB’s FQDN. During this request, the client 1 includes a cookie with a key named user and the corresponding value set to user2.
Step 5: The ALB is set up with advanced request routing, specifically using header-based routing. This configuration involves inspecting a specific cookie in incoming requests. When the ALB identifies that the cookie user is set to user2 in a request, it follows a routing rule that directs the incoming traffic to Target Group 2.
Step 6: Target Group 2 consists of the Pods running in Amazon EKS Cluster 2. It serves as the destination for routing traffic directed at Service 2 running in the same Amazon EKS Cluster 2.

Code sample

We have created a GitHub repository that you can use to deploy the solution architecture described in this post. The code sample is for demonstration purpose only. It should not be used in production environments. Please refer to Amazon EKS Best Practices Guides, especially the Security Best Practices section, to learn how to run production workloads on Amazon EKS.

Considerations

Although we showed this architecture using Amazon EKS clusters, it is worth mentioning that the solution applies to self-managed Kubernetes clusters on AWS as well.
This solution consolidates the load balancing function and use the same load balancer across your Amazon EKS clusters. Given that using a single ALB at scale, the ALB service quotas may have a direct impact on your design decisions.
- When you use ip target type the total number of unique Kubernetes service endpoints (Pods) will count against the Targets per Application Load Balancer, which is 1,000 by default and it is adjustable by requesting a quota increase.
- When you use instance target type then the total number of your worker nodes counts against the Targets per Application Load Balancer.
If you are using your ALB in a separate, centralized, shared VPC and route the requests to other VPC(s) where your Amazon EKS clusters are running, then instance target type isn’t supported.
Consider implementing Pod Readiness Gates to make sure that your Pods have successfully registered to the load balancer and ready to receive traffic. AWS Load Balancer supports Pod Readiness Gates. Please read the Load Balancing section in the EKS Best Practices guide for additional recommendations.
As you have seen in this scenario, each microservice would consume a rule on the ALB. Rules are evaluated in priority order, from the lowest value to the highest value. The default rule is evaluated last. By default an ALB can have up to 100 rules and it is adjustable by requesting a quota increase.
Using the solution across different applications and environment will increase the blast radius, for example, a miss-configuration on the ALB can affect all the applications. Hence, we recommend using the solution with a single application.

Conclusion

In this post, we showed you how to use the advanced request routing capability of an existing Application Load Balancer to route traffic to micro services spread across multiple Amazon EKS clusters in a given AWS Region. ALB’s advanced request routing, when used in conjunction with the AWS Load Balancer Controller and the TargetGroupBinding custom resource, enables efficient traffic routing across distinct microservices deployed across Kubernetes clusters.

Containers