Containers
Optimize IP addresses usage by pods in your Amazon EKS cluster
Many enterprise customers adopt multi-account strategy to meet their business needs and at the same time reduce the security blast radius. Customers have had problems maintaining network topology because of constant growth and increased workloads. They can quickly run out of IP space while planning out the VPC Classless Inter-Domain Routing (CIDR). In this blog, we briefly explain two solution options to address this problem, one using custom networking and another without.
Things you should know
Kubernetes: Kubernetes is a popular open source orchestration system of deploying and managing containerized applications, at scale.
Amazon EKS: Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own infrastructure. This fully managed service is designed to provide scalability, security, and integrations with other AWS services like Amazon Elastic Container Registry (Amazon ECR), IAM, and Elastic Load Balancer, to name a few.
AWS VPC CNI plugin: The AWS VPC CNI networking plugin provides native networking capabilities for pods running inside a Virtual Private Cloud (VPC). This plugin is responsible for assigning IPs to the pods whenever the pods are created. In order to assign an IP, the plugin needs to be aware of the underlying instance’s ENI capacity and know how many primary and secondary interfaces are supported by the particular instance type.
Customer scenarios:
There are customers with tens or hundreds of accounts and thousands of workloads. As customers scale the EKS clusters and deploy more workloads, the number of pods managed by a cluster can grow easily above thousands of pods. Now, each of these pods will consume an IP address. This scenario might become challenging as the availability of IP addresses on a VPC is limited and it’s not always possible to recreate a larger VPC or extend the current VPC’s CIDR blocks. It’s worth noting that both the worker nodes and the pods themselves require IP addresses. Also, the CNI will reserve IP addresses for future use.
Now, if you have pods executing in your EKS cluster and are already running out of IP addresses on your primary VPC CIDR (e.g. 10.0.0.0/20), we have two options:
- CNI custom networking: this option uses the custom networking plugin to maintain your workers nodes’ IP addresses on your primary VPC CIDR (in this example, 10.0.0.0/20), but move your pods’ IP addresses to a larger subnet (e.g. 100.64.0.0/8). In this scenario, you can move all your pods to the new range and still use your 10.0.0.0/20 CIDR IP addresses as the source IP. However, here are some considerations you should know while using this configuration:
- This solution will result in getting lesser IP addresses for the pods.
- The solution is comparatively complex as it involves manual calculation & configuration of “max pods.”
- The solution is not supported by EKS managed node groups.
- Secondary CIDR with VPC: another option is to deploy new worker nodes with both the instance and pods networking on a new larger CIDR block (e.g. 100.64.0.0/8). In this scenario, after adding the new CIDR to your VPC, you can deploy another node group using the secondary CIDR and drain the original nodes to automatically redeploy the pods onto the new worker nodes.
Option 1: CNI custom networking
As shown in the diagram below, the primary Elastic Network Interface (ENI) of the worker node still uses the primary VPC CIDR range (in this case 10.0.0.0/20) but the secondary ENIs for pods use the secondary VPC CIDR Range (in this case 100.64.0.0/10). Now, in order to have the pods using the 100.64.0.0/8 CIDR range, you will have to configure the CNI plugin to use custom networking. You can follow through the steps as documented here. You can also use a script published in this blog to execute and configure the plugin effortlessly.
Option 2: Unique network for nodes and pods.
In this scenario, you can still maintain the EKS cluster as is (the control plane will still live on the original subnet/s) but you’ll completely migrate the instances and the pods to a secondary subnet/s. In the example shown on the diagram below, both the primary and the secondary ENIs use the secondary VPC CIDR Range (in this case 100.64.0.0/10).
How to use secondary CIDR with EKS (Option 2):
The following steps help you set up an Amazon EKS cluster and the managed worker nodes (data plane), followed by steps to drain worker nodes and move the pods to new worker nodes that you planned to retain. We have used Amazon AWS CLI to execute all the steps, which can also be easily automated in your CICD pipeline. You can execute these steps from your local machine, by logging into an Amazon EC2 instance or AWS SSM Session Manager.
Step 1: let’s begin by creating Amazon Virtual Private Cloud (VPC) and subnets where we can deploy Amazon EKS clusters. If you plan on using an existing VPC, you can skip this step and directly jump to step 2.
Set the environment variables like CLUSTER_NAME and KEY_NAME that can used for subsequent commands
export CLUSTER_NAME=eks-cluster
export KEY_NAME=<an existing Keypair>
This CloudFormation stack will deploy a VPC and private subnets on the 10.0.0.0/20 CIDR block. Save the below CloudFormation script as eks-vpc.yaml and execute the following AWS CLI command to deploy the stack.
AWSTemplateFormatVersion: '2010-09-09' Description: This template deploys a VPC, with three public and private subnets spread across three Availability Zones. It deploys an internet gateway, with a default route on the public subnets. It deploys three NAT gateways (one in each AZ), and default routes for them in the private subnets. Parameters: EnvironmentName: Description: An environment name that is prefixed to resource names Type: String VpcCIDR: Description: Please enter the IP range (CIDR notation) for this VPC Type: String Default: 10.0.0.0/16 PublicSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the public subnet in the first Availability Zone Type: String Default: 10.0.0.0/20 PublicSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the public subnet in the second Availability Zone Type: String Default: 10.0.16.0/20 PublicSubnet3CIDR: Description: Please enter the IP range (CIDR notation) for the public subnet in the second Availability Zone Type: String Default: 10.0.32.0/20 PrivateSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the first Availability Zone Type: String Default: 10.0.48.0/20 PrivateSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String Default: 10.0.64.0/20 PrivateSubnet3CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String Default: 10.0.80.0/20 Resources: VPC: Type: AWS::EC2::VPC Properties: CidrBlock: !Ref VpcCIDR EnableDnsSupport: true EnableDnsHostnames: true Tags: - Key: Name Value: !Sub ${EnvironmentName}-vpc InternetGateway: Type: AWS::EC2::InternetGateway Properties: Tags: - Key: Name Value: !Sub ${EnvironmentName}-igw InternetGatewayAttachment: Type: AWS::EC2::VPCGatewayAttachment Properties: InternetGatewayId: !Ref InternetGateway VpcId: !Ref VPC PublicSubnet1: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref PublicSubnet1CIDR MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${EnvironmentName}-public-subnet-1 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PublicSubnet2: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref PublicSubnet2CIDR MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${EnvironmentName}-public-subnet-2 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PublicSubnet3: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 2, !GetAZs '' ] CidrBlock: !Ref PublicSubnet3CIDR MapPublicIpOnLaunch: true Tags: - Key: Name Value: !Sub ${EnvironmentName}-public-subnet-3 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateSubnet1: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet1CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-subnet-1 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateSubnet2: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet2CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-subnet-2 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateSubnet3: Type: AWS::EC2::Subnet Properties: VpcId: !Ref VPC AvailabilityZone: !Select [ 2, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet3CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-subnet-3 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared NatGateway1EIP: Type: AWS::EC2::EIP DependsOn: InternetGatewayAttachment Properties: Domain: vpc NatGateway2EIP: Type: AWS::EC2::EIP DependsOn: InternetGatewayAttachment Properties: Domain: vpc NatGateway3EIP: Type: AWS::EC2::EIP DependsOn: InternetGatewayAttachment Properties: Domain: vpc NatGateway1: Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt NatGateway1EIP.AllocationId SubnetId: !Ref PublicSubnet1 Tags: - Key: Name Value: !Sub ${EnvironmentName}-ngw-1 NatGateway2: Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt NatGateway2EIP.AllocationId SubnetId: !Ref PublicSubnet2 Tags: - Key: Name Value: !Sub ${EnvironmentName}-ngw-2 NatGateway3: Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt NatGateway3EIP.AllocationId SubnetId: !Ref PublicSubnet3 Tags: - Key: Name Value: !Sub ${EnvironmentName}-ngw-3 PublicRouteTable: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvironmentName}-public-routetable DefaultPublicRoute: Type: AWS::EC2::Route DependsOn: InternetGatewayAttachment Properties: RouteTableId: !Ref PublicRouteTable DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref InternetGateway PublicSubnet1RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PublicRouteTable SubnetId: !Ref PublicSubnet1 PublicSubnet2RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PublicRouteTable SubnetId: !Ref PublicSubnet2 PublicSubnet3RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PublicRouteTable SubnetId: !Ref PublicSubnet3 PrivateRouteTable1: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-routes-1 DefaultPrivateRoute1: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable1 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway1 PrivateSubnet1RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable1 SubnetId: !Ref PrivateSubnet1 PrivateRouteTable2: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-routes-2 DefaultPrivateRoute2: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable2 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway2 PrivateSubnet2RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable2 SubnetId: !Ref PrivateSubnet2 PrivateRouteTable3: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-routes-3 DefaultPrivateRoute3: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable3 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway3 PrivateSubnet3RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable3 SubnetId: !Ref PrivateSubnet3 NoIngressSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupName: "no-ingress-sg" GroupDescription: "Security group with no ingress rule" VpcId: !Ref VPC Outputs: VPC: Description: A reference to the created VPC Value: !Ref VPC PublicSubnets: Description: A list of the public subnets Value: !Join [ ",", [ !Ref PublicSubnet1, !Ref PublicSubnet2 , !Ref PublicSubnet3 ]] PrivateSubnets: Description: A list of the private subnets Value: !Join [ ",", [ !Ref PrivateSubnet1, !Ref PrivateSubnet2 , !Ref PrivateSubnet3 ]] PublicSubnet1: Description: A reference to the public subnet in the 1st Availability Zone Value: !Ref PublicSubnet1 PublicSubnet2: Description: A reference to the public subnet in the 2nd Availability Zone Value: !Ref PublicSubnet2 PublicSubnet3: Description: A reference to the public subnet in the 3rd Availability Zone Value: !Ref PublicSubnet3 PrivateSubnet1: Description: A reference to the private subnet in the 1st Availability Zone Value: !Ref PrivateSubnet1 PrivateSubnet2: Description: A reference to the private subnet in the 2nd Availability Zone Value: !Ref PrivateSubnet2 PrivateSubnet3: Description: A reference to the private subnet in the 3rd Availability Zone Value: !Ref PrivateSubnet3 NatGateway1: Description: A reference to the Nat Gateway in the 1st Availability Zone Value: !Ref NatGateway1 NatGateway2: Description: A reference to the Nat Gateway in the 2nd Availability Zone Value: !Ref NatGateway2 NatGateway3: Description: A reference to the Nat Gateway in the 3rd Availability Zone Value: !Ref NatGateway3 NoIngressSecurityGroup: Description: Security group with no ingress rule Value: !Ref NoIngressSecurityGroup
aws cloudformation create-stack --stack-name $CLUSTER_NAME-vpc \
--template-body file://eks-vpc.yaml \
--parameters \
ParameterKey=EnvironmentName,ParameterValue=$CLUSTER_NAME
Step 2: You can create the Amazon EKS control plane in a number of ways by using tools like eksctl, cdk8s, or custom CloudFormation templates. For this example, we have gone ahead and used this CloudFormation stack by running the below AWS CLI command, which deploys the cluster on the subnets created in Step 1.
Save the below CloudFormation as eks-control-plane.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Stack for creating an basic EKS control plane.
Parameters:
Name:
Type: String
Description: EKS cluster name.
Vpc:
Type: AWS::EC2::VPC::Id
Description: VPC to create the control plane and SGs in.
Subnets:
Type: List<AWS::EC2::Subnet::Id>
Description: Subnets to create the control plane in. Recommend picking both public
and private subnets.
Version:
Type: String
Description: Kubernetes master version.
AllowedValues:
- '1.16'
- '1.15'
- '1.14'
Default: '1.16'
ConstraintDescription: Pick a version for the K8s master.
Resources:
ClusterRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Join
- '-'
- - !Ref 'AWS::StackName'
- cluster-role
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
- arn:aws:iam::aws:policy/AmazonEKSServicePolicy
- arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: eks.amazonaws.com
Action: sts:AssumeRole
ControlPlaneSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Cluster communication with worker nodes
VpcId: !Ref 'Vpc'
ControlPlane:
Type: AWS::EKS::Cluster
Properties:
Name: !Ref 'Name'
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref 'ControlPlaneSecurityGroup'
SubnetIds: !Ref 'Subnets'
RoleArn: !GetAtt 'ClusterRole.Arn'
Version: !Ref 'Version'
Outputs:
ClusterArn:
Value: !GetAtt 'ControlPlane.Arn'
ApiEndpoint:
Value: !GetAtt 'ControlPlane.Endpoint'
CertificateAuthorityData:
Value: !GetAtt 'ControlPlane.CertificateAuthorityData'
Execute the below commands to run the cloudformation stack.
export VPC_ID=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc --query 'Stacks[0].Outputs[?OutputKey==`VPC`].OutputValue' --output text)
export SUBNETS_IDS=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc --query 'Stacks[0].Outputs[?OutputKey==`PrivateSubnets`].OutputValue' --output text)
aws cloudformation create-stack --stack-name $CLUSTER_NAME-control-plane \
--template-body file://eks-control-plane.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=Name,ParameterValue=$CLUSTER_NAME \
ParameterKey=Vpc,ParameterValue=$VPC_ID \
'ParameterKey=Subnets,ParameterValue="'"$SUBNETS_IDS"'"'
Step 3: you will create the kubeconfig file for your cluster using the below command. By default, the resulting configuration file is created at the default kubeconfig path (.kube/config) in your home directory or merged with an existing kubeconfig at that location.
aws eks update-kubeconfig --name $CLUSTER_NAME
At this point, you can verify if your service is up using the command “kubectl get svc” as shown below.
Step 4: we will use the CloudFormation stack to create the data plane, which is the first set of managed worker node groups, using the below AWS CLI command to deploy the CloudFormation stack. In this command, we feed in some of the inputs such as:
- ClusterName, which is name of the cluster you used in step 1
- A node group name
- Node group auto scaling parameters like MinSize, MaxSize, and Desired Size
- The key pair name to be used for the worker nodes (EC2 instances)
- The security groups to protect the worker nodes
- Subnet id’s where the worker nodes should be deployed and the VPC id
Save the below CloudFormation stack as eks-data-plane.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Stack for creating worker nodes for an existing EKS cluster.
Parameters:
ProvidedSecurityGroup:
Type: String
Description: Inbound SSH, Outbout ALL
Default: 'sg-0d640ec8e209ef5a7'
KeyName:
Description: The EC2 Key Pair to allow SSH access to the instances
Type: AWS::EC2::KeyPair::KeyName
NodeInstanceType:
Description: EC2 instance type for the node instances
Type: String
Default: t3.small
ConstraintDescription: Must be a valid EC2 instance type
AmiType:
Type: String
Description: The AMI type for the Node Group
Default: AL2_x86_64
NodeGroupName:
Type: String
Description: Unique identifier for the Node Group.
NodeGroupScalingConfigMinSize:
Type: Number
Description: Minimum size of Node Group ASG
Default: 3
NodeGroupScalingConfigMaxSize:
Type: Number
Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeGroupScalingConfigDesiredSize
Default: 10
NodeGroupScalingConfigDesiredSize:
Type: Number
Description: Desired capacity of Node Group ASG
Default: 3
NodeDiskSize:
Type: Number
Description: Node EBS volume size
Default: 300
ClusterName:
Description: The cluster name provided when the cluster was created. If it is
incorrect, nodes will not be able to join the cluster
Type: String
ClusterControlPlaneSecurityGroup:
Description: The security group of the cluster control plane
Type: AWS::EC2::SecurityGroup::Id
VpcId:
Description: The VPC of the worker instances
Type: AWS::EC2::VPC::Id
Subnets:
Description: The subnets where workers should be created. Recommended to pick
only private subnets.
Type: List<AWS::EC2::Subnet::Id>
Resources:
NodeInstanceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: /
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AdministratorAccess
NodeInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref 'NodeInstanceRole'
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for all nodes in the cluster
VpcId: !Ref 'VpcId'
Tags:
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
Value: owned
NodeSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
GroupId: !Ref 'NodeSecurityGroup'
SourceSecurityGroupId: !Ref 'NodeSecurityGroup'
IpProtocol: '-1'
FromPort: 0
ToPort: 65535
NodeSecurityGroupKubernetesService:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow all traffic for K8s service ports. Workaround for NLB bug
GroupId: !Ref 'NodeSecurityGroup'
CidrIp: '0.0.0.0/0'
IpProtocol: tcp
FromPort: 30000
ToPort: 32767
NodeSecurityGroupFromControlPlaneIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the
cluster control plane
GroupId: !Ref 'NodeSecurityGroup'
SourceSecurityGroupId: !Ref 'ClusterControlPlaneSecurityGroup'
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroup:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet
and pods
GroupId: !Ref 'ClusterControlPlaneSecurityGroup'
DestinationSecurityGroupId: !Ref 'NodeSecurityGroup'
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive
communication from cluster control plane
GroupId: !Ref 'NodeSecurityGroup'
SourceSecurityGroupId: !Ref 'ClusterControlPlaneSecurityGroup'
IpProtocol: tcp
FromPort: 443
ToPort: 443
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running
extension API servers on port 443
GroupId: !Ref 'ClusterControlPlaneSecurityGroup'
DestinationSecurityGroupId: !Ref 'NodeSecurityGroup'
IpProtocol: tcp
FromPort: 443
ToPort: 443
ClusterControlPlaneSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
GroupId: !Ref 'ClusterControlPlaneSecurityGroup'
SourceSecurityGroupId: !Ref 'NodeSecurityGroup'
IpProtocol: tcp
ToPort: 443
FromPort: 443
NodeSSHSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow ssh into worker node
GroupId: !Ref 'NodeSecurityGroup'
SourceSecurityGroupId: !Ref 'ProvidedSecurityGroup'
IpProtocol: tcp
ToPort: 22
FromPort: 22
NodeGroup:
Type: AWS::EKS::Nodegroup
Properties:
AmiType: !Ref 'AmiType'
ClusterName: !Ref 'ClusterName'
DiskSize: !Ref 'NodeDiskSize'
InstanceTypes:
- !Ref 'NodeInstanceType'
NodegroupName: !Ref 'NodeGroupName'
NodeRole: {"Fn::GetAtt" : ["NodeInstanceRole", "Arn"] }
RemoteAccess:
Ec2SshKey: !Ref 'KeyName'
SourceSecurityGroups:
- !Ref 'ProvidedSecurityGroup'
- !Ref 'NodeSecurityGroup'
ScalingConfig:
MinSize: !Ref 'NodeGroupScalingConfigMinSize'
DesiredSize: !Ref 'NodeGroupScalingConfigDesiredSize'
MaxSize: !Ref 'NodeGroupScalingConfigMaxSize'
Subnets: !Ref 'Subnets'
Tags:
Name: !Sub ${NodeGroupName}-NodeGroup
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt 'NodeInstanceRole.Arn'
NodeSecurityGroup:
Description: The security group for the node group
Value: !Ref 'NodeSecurityGroup'
export CLUSTER_SG=$(aws eks describe-cluster --name $CLUSTER_NAME --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId')
export ADDITIONAL_SG=$(aws eks describe-cluster --name $CLUSTER_NAME --query 'cluster.resourcesVpcConfig.securityGroupIds[0]')
aws cloudformation create-stack --stack-name $CLUSTER_NAME-data-plane \
--template-body file://eks-data-plane.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=ClusterControlPlaneSecurityGroup,ParameterValue=$CLUSTER_SG \
ParameterKey=ClusterName,ParameterValue=$CLUSTER_NAME \
ParameterKey=NodeGroupName,ParameterValue=$CLUSTER_NAME-nodegroup \
ParameterKey=KeyName,ParameterValue=$KEY_NAME \
ParameterKey=VpcId,ParameterValue=$VPC_ID \
'ParameterKey=Subnets,ParameterValue="'"$SUBNETS_IDS"'"' \
ParameterKey=ProvidedSecurityGroup,ParameterValue=$ADDITIONAL_SG
Before deploying the application, you can use the below command to monitor and wait until all the nodes are in “Ready” state.
kubectl get nodes --watch
Step 5: We will use this deployment.yaml file to deploy a demo application using the below kubectl command. Save the below file as deployment.yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginxdemos/hello:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: default
labels:
app: nginx
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
type: LoadBalancer
Execute the below command to deploy the app.
kubectl create -f deployment.yaml
This will take a few minutes after which the pods should be in running state. You can verify using the below command:
kubectl get Pods --all-namespaces
The below command will display the load balancer endpoint of the deployed application for testing:
kubectl get service -o wide
Now, when you open this endpoint on a browser, you will observe that the server is deployed on the 10.0.0.0/16 CIDR block.
Step 6: In this step, we will attach a new CIDR block to the VPC and create new subnets on the 100.64.0.0/16 CIDR block using this CloudFormation stack and the below AWS CLI commands.
You may save the below CloudFormation as eks-vpc-secondary.yaml
AWSTemplateFormatVersion: '2010-09-09' Description: This template deploys a new CIDR block and three private subnets spread across three Availability Zones. Parameters: EnvironmentName: Description: An environment name that is prefixed to resource names Type: String VpcId: Description: VPC ID of the main VPC Type: String VpcCIDR: Description: Please enter the IP range (CIDR notation) for this VPC Type: String Default: 100.64.0.0/16 PrivateSubnet1CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the first Availability Zone Type: String Default: 100.64.0.0/20 PrivateSubnet2CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String Default: 100.64.16.0/20 PrivateSubnet3CIDR: Description: Please enter the IP range (CIDR notation) for the private subnet in the second Availability Zone Type: String Default: 100.64.32.0/20 NatGateway1: Description: Please enter the Nat Gateway in the first Availability Zone Type: String NatGateway2: Description: Please enter the Nat Gateway in the second Availability Zone Type: String NatGateway3: Description: Please enter the Nat Gateway in the third Availability Zone Type: String Resources: VpcCidrBlock: Type: AWS::EC2::VPCCidrBlock Properties: VpcId: !Ref VpcId CidrBlock: !Ref VpcCIDR PrivateSubnet1: Type: AWS::EC2::Subnet DependsOn: VpcCidrBlock Properties: VpcId: !Ref VpcId AvailabilityZone: !Select [ 0, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet1CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-subnet-1 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateSubnet2: Type: AWS::EC2::Subnet DependsOn: VpcCidrBlock Properties: VpcId: !Ref VpcId AvailabilityZone: !Select [ 1, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet2CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-subnet-2 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateSubnet3: Type: AWS::EC2::Subnet DependsOn: VpcCidrBlock Properties: VpcId: !Ref VpcId AvailabilityZone: !Select [ 2, !GetAZs '' ] CidrBlock: !Ref PrivateSubnet3CIDR MapPublicIpOnLaunch: false Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-subnet-3 - Key: !Sub kubernetes.io/cluster/${EnvironmentName} Value: shared PrivateRouteTable1: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VpcId Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-routes-1 DefaultPrivateRoute1: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable1 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway1 PrivateSubnet1RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable1 SubnetId: !Ref PrivateSubnet1 PrivateRouteTable2: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VpcId Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-routes-2 DefaultPrivateRoute2: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable2 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway2 PrivateSubnet2RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable2 SubnetId: !Ref PrivateSubnet2 PrivateRouteTable3: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VpcId Tags: - Key: Name Value: !Sub ${EnvironmentName}-private-secondary-routes-3 DefaultPrivateRoute3: Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable3 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway3 PrivateSubnet3RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: RouteTableId: !Ref PrivateRouteTable3 SubnetId: !Ref PrivateSubnet3 Outputs: PrivateSubnets: Description: A list of the private subnets Value: !Join [ ",", [ !Ref PrivateSubnet1, !Ref PrivateSubnet2 , !Ref PrivateSubnet3 ]] PrivateSubnet1: Description: A reference to the private subnet in the 1st Availability Zone Value: !Ref PrivateSubnet1 PrivateSubnet2: Description: A reference to the private subnet in the 2nd Availability Zone Value: !Ref PrivateSubnet2 PrivateSubnet3: Description: A reference to the private subnet in the 3rd Availability Zone Value: !Ref PrivateSubnet3
Run the following commands to deploy the stack.
export NGW1=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc --query 'Stacks[0].Outputs[?OutputKey==`NatGateway1`].OutputValue' --output text)
export NGW2=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc --query 'Stacks[0].Outputs[?OutputKey==`NatGateway2`].OutputValue' --output text) export NGW3=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc --query 'Stacks[0].Outputs[?OutputKey==`NatGateway3`].OutputValue' --output text)
aws cloudformation create-stack --stack-name $CLUSTER_NAME-vpc-secondary \
--template-body file://eks-vpc-secondary.yaml \
--parameters \ ParameterKey=EnvironmentName,ParameterValue=$CLUSTER_NAME \
ParameterKey=VpcId,ParameterValue=$VPC_ID \
ParameterKey=NatGateway1,ParameterValue=$NGW1 \
ParameterKey=NatGateway2,ParameterValue=$NGW2 \
ParameterKey=NatGateway3,ParameterValue=$NGW3
Step 7: In this step, we will deploy the second set of managed worker nodes on the new subnets we created in the previous step using this CloudFormation and the below AWS CLI command:
You will be using the CloudFormation stack created in step 4 (eks-data-plane.yaml).
export SECONDARY_SUBNETS_IDS=$(aws cloudformation describe-stacks --stack-name $CLUSTER_NAME-vpc-secondary --query 'Stacks[0].Outputs[?OutputKey==`PrivateSubnets`].OutputValue' --output text)
aws cloudformation create-stack --stack-name $CLUSTER_NAME-data-plane-secondary \
--template-body file://eks-data-plane.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=ClusterControlPlaneSecurityGroup,ParameterValue=$CLUSTER_SG \
ParameterKey=ClusterName,ParameterValue=$CLUSTER_NAME \
ParameterKey=NodeGroupName,ParameterValue=$CLUSTER_NAME-nodegroup-secondary \
ParameterKey=KeyName,ParameterValue=$KEY_NAME \
ParameterKey=VpcId,ParameterValue=$VPC_ID \
'ParameterKey=Subnets,ParameterValue="'"$SECONDARY_SUBNETS_IDS"'"' \
ParameterKey=ProvidedSecurityGroup,ParameterValue=$ADDITIONAL_SG
You can wait until the nodes are in “Ready” state using the below command:
kubectl get nodes --watch
Step 8: Now that we have the application running on subnets using 10.0.0.0/8 CIDR block, we will first issue the cordon command. This command stops scheduling any new pods to these worker nodes. We will then drain these worker nodes deployed in step 4, so that the pods automatically get terminated and recreated in other healthy worker nodes based on subnets using 100.64.0.0/16 CIDR range.
Command to cordon all the nodes running on 10.0.0.0/16 CIDR block.
kubectl get nodes --no-headers=true | awk '/ip-10-0/{print $1}' | xargs kubectl cordon
Command to drain all the nodes running on 10.0.0.0/16 CIDR block. You need to use –ignore-daemonsets flag in order to drain nodes with daemonsets and use –delete-local-data flag to overide and delete any pods that utilize an emptyDir volume.
kubectl get nodes --no-headers=true | awk '/ip-10-0/{print $1}' | xargs kubectl drain --force --ignore-daemonsets --delete-local-data
You can verify that the endpoint is now running on 100.64.0.0/16 CIDR block by accessing the endpoint again using the below command:
kubectl get service -o wide
Conclusion
You should now be able to choose the most appropriate solution if you are running into an issue of limited IP addresses in your existing VPC CIDR range. Also, it is very important to plan out your VPC CIDR ranges across your multiple accounts to make sure you do not have overlaping IP addresses requiring complex NATing resolutions. We highly recommend reading this blog that explains various networking patterns, especially for customers having hybrid environments with connectivity to their data centers.
Further reading references:
If you are new to Kubernetes, this should help you understand the basics. Also, you can check our documentation if you are new to Amazon EKS. With the adoption of Kubernetes, Jeremy Cowan explains in this blog on how the problem IP address shortage escalates with increase in workload and also provides a solution to this problem.
About the Authors:
Jose Olcese
Jose is a Principal Cloud Application Architect with Amazon Web Services where he helps customers build cutting-edge, cloud-based solutions. Jose has over 20 years of experience in software development for a variety of different industries and has helped hundreds of customers to integrate Identity solutions with AWS. Outside of work, Jose enjoys spending time with his family, running and building things.
Umesh Kumar Ramesh
Umesh is a Sr. Cloud Infrastructure Architect with AWS who delivers proof-of-concept projects, topical workshops, and leads implementation projects. He holds a Bachelor’s degree in Computer Science & Engineering from the National Institute of Technology, Jamshedpur (India). Outside of work, he enjoys watching documentaries, biking, practicing meditation and discuss spirituality.