Automated IP address management for Multus Workers and Pods
Telco 5G and IMS container based workloads utilize Multus CNI to achieve traffic and routing separation and packet acceleration. Multus container network interface plugin enables Kubernetes pods to attach to multiple interfaces and networks. Multus CNI being a “Meta-plugin”, achieves this by invoking other CNIs, such as VPC CNI (default CNI in Amazon Elastic Kubernetes Service (Amazon EKS)), IPVLAN CNI, etc. VPC CNI manages the Kubernetes pod primary interface, whereas IPVLAN CNI allows the pods to use secondary interfaces of the worker node by using the same MAC address on the worker node Elastic Network interface (ENI). Multus CNI also invokes the IPAM CNIs, such as host-local, static, and whereabouts.
The post, Amazon EKS now supports Multus CNI, helps you get started with the Multus workload deployment on Amazon EKS. For a production grade deployment of Multus-based workload on Amazon EKS, you must plan to manage the following two topics:
- Multus Worker Node group: Worker node subnets, and IP address management for the worker nodes, as each worker node ENI needs an IP address from the subnet.
- Multus Pod Networking: Multus Pod IP management from the worker node subnets, and pod communication in Amazon Virtual Private Cloud (Amazon VPC).
In this post, I’ll walk through the standard Multus nodegroup deployment approach, as well as the challenges and approaches to manage above topics when using the ipvlan CNI with your Multus workloads inside of an Amazon VPC.
Standard Multus Nodegroup deployment approach
The previous diagram represents a sample deployment of Multus workload with IPVLAN CNI, which will be referred to in this post. In this example, we have an Amazon EKS cluster and a node group with two worker nodes. Both worker nodes are attached with two subnets as follows:
- eth0 network: 10.10.12.0/24 (used for VPC CNI, i.e., primary pod interface)
- eth1 network: 10.10.1.0/24 (used for Multus ipvlan cni, i.e., pod secondary interface)
For the previous sample deployment approach, the deployment starts with the Amazon EKS cluster deployment, followed by the Amazon EKS node group deployment. Once the node groups are deployed, you can deploy the Multus CNI and relevant plugins to support your workloads. Once the plugins are deployed, you can deploy your workloads.
In the following section, I will discuss the deployment strategy for Multus Worker Node group and IP management.
Multus Worker Node group deployment and IP management
Multus Worker Node deployment
Self-managed Multus Node group, uses autoscaling group, providing resiliency (maintains the minimum number of workers) and scalability. The Autoscaling group utilizes Launch Template to configure the Amazon Elastic Compute Cloud (Amazon EC2) worker nodes. Autoscaling Group, in conjunction with Launch Template, can create worker nodes with single or multiple interfaces belonging to the same subnet. A custom deployment strategy achieves multiple network interface attachments from different subnets by using a custom AWS Lambda, which adds the Multus interfaces on Amazon EC2 Autoscaling lifecycle hooks.
As shown in the following diagram, deployment starts with the Autoscaling Group creating the worker nodes with single interface (eth0) using the Launch templates. Once the workers are launched, a custom Lambda terminates the single interface nodes. This scale-in causes the “autoscaling:EC2_INSTANCE_TERMINATING” event, which triggers a custom Lambda and then drains the node/or does nothing.
After the completion of this event, the Autoscaling group scales-out to meet the desired capacity, causing the “autoscaling:EC2_INSTANCE_LAUNCHING” event. This event triggers a custom Lambda function, which adds secondary interfaces from the Multus subnets, with the Tag (key: node.k8s.amazonaws.com/no_manage” value: true).
EKS Multus node group Creation Flow
Worker Node IP management challenges
Autoscaling groups provide elasticity and resiliency to the Amazon EKS worker nodegroups, and uses DHCP IP allocation to all of the interfaces to support the same. On the other hand, for Multus pods, non-VPC IPAM CNIs (host-local, whereabouts, etc.), manage the IP allocation on secondary interfaces, using static address ranges/pools. The pod egress/ingress happens through the corresponding worker ENIs, so that the pod IP and worker ENI IP address must be from the subnet. Application planners, allocate static IP ranges for the Multus pods, by Multus network-attachment-definition configurations and annotations.
Two different and disjointed IP allocation methods (DHCP and static) on the same subnet cause interesting challenge in workload deployment. Worker node IP assignment being DHCP is random, and since it has no knowledge of other static allocation, it can get any IP address from the planned static IP range for the pods. Multus IPAM CNI (host-local, whereabouts, etc.) is unaware of this assignment. If this IP is taken away by worker interface, then application pods will have the IP conflict and the IP assignment will fail, thereby causing an unpredictable application deployment.
In the following sections, I will walk you through with two possible approaches to better manage the IP addressing of worker nodes and Multus workloads in a non-conflicting way.
Approach 1: Allocate Worker IPs statically with a custom Lambda
This solution approach works on the logical subnet sharing model, between the workers and pods. In this approach, worker nodes take the unallocated IPs from the beginning of the subnet, and the Multus pods take the unallocated IPs from end of the subnet. With this allocation strategy, IP address allocation isn’t random for worker node interfaces, and IP allocation happens statically from the first free available IPs from the subnet.
Refer to the GitHub repo Allocate Worker IPs statically via a custom lambda-based solution for a detailed description, sample AWS CloudFormation template, and supporting Lambda functions.
Approach 2: Use VPC subnet CIDR reservation (static) for the pods’ IP addresses
This approach uses the VPC subnet CIDR reservation strategy to provide the separation between worker nodes and Multus Pod IP address allocation. With this approach, you can explicitly reserve the Multus pod IP CIDR ranges as static, making sure that DHCP for Amazon EC2 worker nodes doesn’t allocate the IP addresses to worker nodes from this block.
To achieve this, you could create a reservation of the pod IP address chunks (the minimum subnet CIDR reservation allowed is /28) for explicit (static) allocation only. The unreserved chunk of the subnet CIDR would be available for the DHCP (default) allocation for the worker nodes behind the Autoscaling group.
Refer to the GitHub repo Use VPC subnet cidr reservation (static) for pods IP addresses for the detailed description, sample CloudFormation template, and supporting Lambda functions.
Automated Multus Pod Networking
Now that the Amazon EKS cluster and the Multus node group are deployed with either of the previous approaches, you can deploy your workloads on Amazon EKS using Multus. As a next step, you will deploy the Multus CNI as mentioned in the git repo and install whereabouts IPAM CNI. In this post, I am using whereabouts IPAM CNI to manage the cluster unique Multus IP addresses.
Now, let’s understand how IP communication works in VPC, approaches to enable the routing, and IP assignment for Multus pods in an automated way.
Multus Pod IP management and routing challenges
In the following example, note that when you deploy the Multus pods, the communication between pods on different workers doesn’t work, even if the security group rules/NACL aren’t blocking the traffic. However, intercommunication between pods on the same worker node works fine.
Here I will explain this behavior in details. The Amazon VPC cloud provides Layer 3 networking to its workloads. ENI is the logical networking entity which contains one or more IP address and corresponding MAC addresses. Amazon VPC routes the traffic to the correct destination, based on the IP address assigned to the ENI. Each ENI attached to the Amazon EC2 worker node must have the desired IP address(es) assigned to it.
For the primary interfaces of the Pod, Amazon VPC CNI assigns primary pod IP addresses (10.10.12.x in the previous example) to pod eth0 (VPC CNI managed interfaces) using DHCP, and assigns these IPs as secondary IPs on the worker node ENI. Non-VPC IPAM CNIs (whereabouts, host-local, static, etc.) allocate the IP address to Multus pod. Therefore, Amazon VPC won’t be aware of this IP address allocation. Furthermore, these IP addresses aren’t assigned as secondary IPs on the respective worker node ENI (in this example eth1).
Note that you can verify the same by examining the worker node ENIs on the Amazon EC2 console: Amazon EC2 console → Instances→ Select Instance (worker)→ Actions → Networking → Manage IP Addresses.
This problem is solved when the IP addresses assumed by pods are assigned to the respective worker ENI. Once these IPs are assigned to the respective ENI (ex: eth1), Amazon VPC updates the mapping of the assigned IPs to the ENI to route the traffic to the designated Multus IP addresses.
In the following example, Multus pod IP addresses 10.10.1.80 and 10.10.1.82 are assigned as secondary IP addresses on the eth1 ENI of the first worker node. Similarly, the 10.10.1.81 secondary IP is assigned to the second worker node eth1 ENI.
Amazon EC2 assign IP address/unassign IP address API calls can automate the IP assignment on the worker node ENIs. The sample Python code and script from the git repo can help achieve the same outcome.
The automation approaches discussed in the following don’t require any change in the application image or source code. You can leverage a custom “IP management” container on these pods to perform the automation of the IP allocation on the respective worker Node ENIs without any impact on applications containers or their architecture. You can enhance the spec of the workload pod/deployment/statefulset with this additional container.
Refer to How to build to build the docker container image which provides this functionality and can be used for either of the following solution options.
Approach 1: InitContainer based IP management solution
This solution works for most ipvlan CNI pods without special/custom handling such as floating IP (explained in the next approach). This approach doesn’t add the constraint of additional CPU/memory requirements on the worker.
This “IP management” container gets executed as a first container while the POD is in the init state. This container checks the IP address of the pod and allocates the IP addresses to the ENI while the pod is in the init state. Once the Multus IP addresses are successfully assigned to the worker node ENIs, this initContainer will terminate and the pod will come out of the init state.
Refer to the InitContainer IP management documentation and deployment procedure to use this solution. You can verify the same by examining the worker node ENIs on the Amazon EC2 console: Amazon EC2 console → Instances→ Select Instance (worker)→ Actions → Networking → Manage IP Addresses.
Approach 2: Sidecar IP management solution
In this approach, the “IP management” container runs as a sidecar container. Moreover, unlike the initContainer, it constantly monitors the pod IP addresses on Multus interfaces for new or changed IP addresses. This is helpful for the pods having custom “Floating IP” handling for the active/standby pod, and based on internal logic, “Floating IP” fails over to the standby pod without traffic disruption. In this case, the sidecar constantly monitors the pod for IP address changes, so there is additional usage of the CPU/Memory (minimal) of this container per-Multus-based pods.
Refer to the Sidecar IP management Solution documentation and procedure to use this solution. You can verify it on the Amazon EC2 console: Amazon EC2 console → Instances→ Select Instance (worker)→ Actions → Networking → Manage IP Addresses.
Refer to Cleanup in the GitHub repo to clean up the sample Multus workloads deployed in this post. Furthermore, to avoid incurring future charges, delete the Multus Worker Node groups from the CloudFormation console. The Amazon EKS cluster can be deleted from the Amazon EKS console.
This post outlines the challenges that customers face during the IP allocation, management, and separation of worker node and Multus pod IP addresses within the Amazon VPC cloud. Furthermore, it describes how Multus pods work in the Amazon EKS and Amazon VPC scope, and route the traffic in the VPC.
Moreover, this post provides a sample automation methodology for the Amazon EKS node group and IP management automation of the Multus pods, without any change in the software/images.
For simplicity, this post only demonstrated the IPv4 handling in the previous examples. However, the sample code in the git repo has IPv6 support as well. You can further adapt the sample source code in the git repo, as per the different application architectures and use cases.