Deploy an Amazon EKS cluster across AWS Outposts with Intra-VPC communication

Introduction

Intra-VPC Communication enables network communication between subnets in the same Amazon Virtual Private Cloud (Amazon VPC) across multiple physical AWS Outposts using the Outposts local gateways (LGW) via direct VPC routing. With this feature, you can leverage a single Amazon VPC architecture for communication between applications and services running on disparate AWS Outposts. You can take advantage of intra-VPC communication to deploy Amazon Elastic Kubernetes Service (Amazon EKS) Kubernetes nodes across disparate AWS Outposts to achieve higher levels of resiliency.

Customers have several reasons for deploying workloads on AWS Outposts:

Running services closer to their end customers for low-latency connectivity
Running analytics or business intelligence processes closer to the data source or filtering data before sending it to the cloud for further analysis
Supporting modernization and migration of legacy on-premises applications to the cloud

Customers might be looking to deploy the Amazon EKS data plane across multiple physical AWS Outposts for a few reasons. Some examples are, increased resilience by spreading workloads across physical compute / fault isolation boundary, or as a mechanism for AWS Outpost capacity expansion or replacement. A fault isolation boundary limits the effect of a failure within a workload to a limited number of components. Components outside of the boundary are unaffected by the failure. By using multiple fault isolation boundaries, you can limit the impact on your workload. You can learn more about creating resilient architectures by checking out the reliability pillar of the AWS Well-Architected.

In addition, customers also told us that they prefer worker nodes in a single Amazon EKS cluster spanning multiple co-located AWS Outposts rather than maintain two Amazon EKS clusters to reduce operational overhead. Customers may also want to tether more than one AWS Outpost to different Availability Zones (AZs) for resilience moving their point of failure from the AWS Outpost / AZ to the Amazon EKS cluster. Note: The customers on-premises network is the communication backbone between the AWS Outposts. The Kubernetes architecture requires a fully connected solution where the worker nodes on the AWS Outposts require a resilient connection to the Kubernetes control plane hosted in the AWS Region, and thus communication between Kubernetes nodes should be resilient. Where the AWS Outposts are not co-located on the same metro network, or where there is not a resilient network between the two AWS Outposts it would be recommended to deploy an Amazon EKS cluster per AWS Outpost and follow a cell-based architecture.

This post presents a sample architecture of how to stretch the Amazon EKS data plane across two co-located AWS Outposts racks using intra-VPC communication and is an extension of the Intra-VPC communication AWS post.

Solution overview

Amazon EKS supports a range of deployment options. In the architecture for this solution, the extended cluster deployment option for Amazon EKS on AWS Outposts is used, where the Kubernetes control plane is deployed in an AWS Region and the worker nodes, are deployed across two AWS Outposts. Local clusters for Amazon EKS on AWS Outposts is an alternative deployment option and enables the Kubernetes control plane and worker nodes to be co-located on the same AWS Outpost. You can learn more about architecting local clusters on AWS Outpost in the fully private local clusters for Amazon EKS on AWS Outposts powered by VPC Endpoints post. The following diagram depicts the extended clusters architecture:

Amazon EKS worker nodes on AWS Outposts Architecture.

Figure 1: Amazon EKS worker nodes on AWS Outposts Architecture.

In this set-up, a single Amazon VPC is created with two in-region private subnets for Amazon EKS and two subnets, one in each AWS Outpost, for the self-managed node groups. As there is no public subnets the VPC must have a VPC interface endpoint to support a private Amazon EKS cluster — see private cluster requirements in the AWS docs for the required endpoints. In addition, the two AWS Outposts are connected to two distinct availability zones in the region via a highly available DX connection.

Workloads deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances on the AWS Outposts can communicate with one another directly over the LGW using Direct VPC Routing, creating a single Kubernetes data plane. Traffic between subnets in the same VPC but on different AWS Outposts cannot hairpin through the region because the traffic is blocked by the service link (Figure 2 shows the traffic flow). It is the customers responsibility to ensure there is a stable network connection between the AWS Outposts.

Traffic flow for intra-VPC communication between two subnets in the same Amazon VPC but disparate AWS Outposts.

Figure 2: Traffic flow for intra-VPC communication between two subnets in the same Amazon VPC but disparate AWS Outposts.

The following walkthrough describes how to deploy Amazon EKS data plane across two AWS Outposts.

Walkthrough

The walkthrough consists of four parts:

Prerequisites
Foundation: core networking set-up
Deploying Amazon EKS (in-region)
Validation

Prerequisites

AWS account, and access to a terminal with Kubectl and Amazon EKS command line tool (eksctl version 0.164.0 or greater), installed and configured.
Two AWS Outposts Rack
An Amazon VPC with the required private subnets, two in-region and two in the AWS Outpost racks (one on each)
Amazon VPC interface endpoints as per AWS docs for private Amazon EKS cluster requirements
Amazon VPC associated to the Local Gateway Route Table
Two AWS Outpost connected via intra-VPC associations as per the Intra VPC how-to post.
- Intra-VPC communication across multiple AWS Outposts with direct VPC routing is available in all AWS Regions where AWS Outposts rack is available. Existing AWS Outposts racks may require an update to enable support for intra-VPC communication across multiple Outposts. If this feature doesn’t work for you, contact AWS Support.

Foundation: Core networking setup

For this walkthrough we assume you have two AWS Outposts set-up and base Amazon VPC networking configured. for this solution we used the following:

Amazon VPC Classless Inter-Domain Routing (CIDR) – 10.0.0.0/16
Two private subnets in-region (with the required Amazon VPC and Gateway endpoints for a private Amazon EKS cluster)
- Subnet Availability Zone A CIDR – 10.0.1.0/24
- Subnet Availability Zone B CIDR – 10.0.2.0/24
Two private subnets one on each AWS Outpost rack:
- AWS Outpost 1 CIDR – 10.0.3.0/24
- AWS Outpost 2 CIDR – 10.0.4.0/24

The following figure (Figure 3) shows the foundational network set-up for the environment. The route tables for the subnets on the AWS Outposts route traffic destined between the subnets over the AWS Outposts LGW.

Amazon EKS foundation network architecture for intra VPC communication.

Figure 3: Amazon EKS foundation network architecture for intra VPC communication.

Deploy Amazon EKS (in-region)

Prior to deploying the Amazon EKS cluster, we need to check which Amazon Elastic Compute Cloud (Amazon EC2) instances are available, as not all AWS Outposts may have the same configuration.

Get the available Amazon EC2 instances on your AWS Outposts by running the following using your Outpost IDs:

aws outposts get-outpost-instance-types --outpost-id <outpost-id>

Create a shared worker node cluster node security group

Security groups are used to control the traffic between the Kubernetes control plane and cluster’s worker nodes, also traffic between worker nodes. The minimum suggested rules can be found in the Amazon EKS AWS docs.

When creating security group rules, for controlling inter-node communication you will need to use IP addresses in the security groups rules as security group metadata is not carried across a customer network. To create a security group, which will later be attached to workers nodes run the following command:

aws ec2 create-security-group \
   --group-name shared-worker-node-sg \
   --description "Amazon EKS shared worker node security group" \
   --vpc-id <vpc-id>

Take note of the created security group ID. For example, to enable Domain Name System (DNS) resolution from Pods in your cluster the security group must allow outbound and inbound communication over Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) port 53 for CoreDNS. You must also add rules for any protocols that you expect for inter-node communication. If you’re considering limiting traffic between nodes through security groups in addition to using Kubernetes Network Policies for fine-grained network control, then you should thoroughly test all of your Pods before promoting to production. Note: by default, Security Groups have a rule that allows all egress traffic to all destinations. If you remove this rule, you must have the minimum rules listed in Restricting cluster traffic and any other ports needed for your Kubernetes cluster to function.

Run the following command to authorize the ingress rules for CoreDNS:

aws ec2 authorize-security-group-ingress \
    --group-id <security group id> \
    --ip-permissions \
      IpProtocol=udp,FromPort=53,ToPort=53,IpRanges="[{CidrIp=<AWS Outpost subnet 1>}]" \
      IpProtocol=udp,FromPort=53,ToPort=53,IpRanges="[{CidrIp=<AWS Outpost subnet 2>}]" \
      IpProtocol=tcp,FromPort=53,ToPort=53,IpRanges="[{CidrIp=<AWS Outpost subnet 1>}]" \
      IpProtocol=tcp,FromPort=53,ToPort=53,IpRanges="[{CidrIp=<AWS Outpost subnet 2>}]"

Create the Amazon EKS cluster

Replace all fields with <> with your own IDs or supported Amazon EC2 instance types where applicable. Note, this creates a Kubernetes endpoint with public access, we suggest a private-only Kubernetes API endpoint.

cat <<EOF > ./eks-cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks
  version: "1.28"
  region: us-west-2
vpc:
  clusterEndpoints:
    privateAccess: true
  sharedNodeSecurityGroup: <shared-security-group-id>
  id: <VPC-ID>
  subnets:
    private:
      <regional-subnet-az-for-subnet>:
          id: "<subnet-id>"
      <regional-subnet-az-for-subnet>:
          id: "<subnet-id>"
nodeGroups:
  - name: outpost-worker-nodes-subnet-1
    instanceType: <instance-type>
    desiredCapacity: 2
    minSize: 2
    maxSize: 2
    volumeSize: 100
    volumeType: gp2
    volumeEncrypted: true
    privateNetworking: true
    subnets: 
      - <kubernetes-data-plane-private-subnet-ID-outpost-1>
  - name: outpost-worker-nodes-subnet-2
    instanceType: <instance-type>
    desiredCapacity: 2
    minSize: 2
    maxSize: 2
    volumeSize: 100
    volumeType: gp2
    volumeEncrypted: true
    privateNetworking: true
    subnets:
      - <kubernetes-data-plane-private-subnet-ID-outpost-2>
EOF

Now, create the cluster:

eksctl create cluster -f ./eks-cluster.yaml

The create cluster command will take more than a few minutes and once successful created you can move onto the validation step.

Validation

To validate, you have a cluster now fetch the nodes by running:

kubectl get nodes -L topology.kubernetes.io/zone

You should see four Kubernetes worker nodes, and the labels show separate fault domains (This assumes your AWS Outposts are anchored to different AZs).

NAME                                         STATUS   ROLES    AGE     VERSION               ZONE
ip-10-77-11-169.us-west-2.compute.internal   Ready    <none>   3m18s   v1.28.3-eks-e71965b   us-west-2a
ip-10-77-11-89.us-west-2.compute.internal    Ready    <none>   3m18s   v1.28.3-eks-e71965b   us-west-2a
ip-10-77-7-126.us-west-2.compute.internal    Ready    <none>   112s    v1.28.3-eks-e71965b   us-west-2b
ip-10-77-7-244.us-west-2.compute.internal    Ready    <none>   114s    v1.28.3-eks-e71965b   us-west-2b

If you deployed a fully private Amazon EKS cluster, then you would need a bastion, or network connectivity from your local machine to the Kubernetes API server to use kubectl. You can validate network connectivity between the nodes by logging in via SSM and pinging as per the Intra-VPC Communication how-to post.

Considerations

There are some additional design considerations with this solution:

Capacity planning: It is still important to plan for capacity across your AWS Outposts to support failure management and Kubernetes data plane upgrades.
Customer network: With intra-VPC communication as traffic transits the customers network and to use security groups for resources in the subnets, you must use rules that include IP address ranges as source or destination for the AWS Outpost subnets. You cannot use security group IDs. See the AWS docs for more information. The solution in this post uses the Amazon VPC CNI and hasn’t been tested with other Container Networking Interfaces (CNIs). Note: Security Groups for Pods won’t work if using Security group references in rules to control east-west traffic between Pods, alternatively you can use Kubernetes Network Policies with the Amazon VPC CNI. See Configure your cluster for Kubernetes network policies in the AWS documentation for details on how to get started.
Ingress traffic: The only AWS managed load balancer supported on AWS Outposts Rack is the AWS Application Load Balancers. Unlike in an AWS Region, an AWS Application Load Balancer at the time of writing this blog a single AWS Application Load Balancers cannot be deployed in subnets on different AWS Outposts. Today, you would need to create a separate AWS Application Load Balancer per subnet and have separate DNS addresses. If you would like load balancers stretched across Kubernetes nodes deployed on different AWS Outposts, then consider alternative ingress controllers like HAProxy, nginx ingress exposed by node port, etc.
Highly available workload: Consider making use of Pod Topology Spread Constraints to spread Replicas of a deployment across different fault-domains for high-availability.
Highly available compute data plane: To further increase the resilience of your data plane deployment, creating a fault isolation boundary it is recommended to create a self-managed node group per subnet and make use of the host or rack spread Amazon EC2 placement group. The host Amazon EC2 spread placement group spreads Amazon EC2 instances across underlying hardware to minimize correlated failures. For example, during AWS Outpost maintenance events this ensures your workloads are spread across racks to minimize impact in the event AWS need to reboot the AWS Outposts equipment to install an update that would impact any instances running on that capacity. This is also why it’s important to plan for capacity. You can learn more about Placement groups on AWS Outposts in the AWS documentation. If stretching a node group across two AWS Outposts, it is important to validate both AWS Outposts support the same instance types. In addition, customers using self-managed node groups should consider installing the Node Termination Handler in Queue processing mode to capture Auto Scaling Group (ASG) lifecycle events to safely cordon and drain workloads during ASG scale-in events (for example during scale-in/scale-out events to upgrade an AMI).

Cleaning-up

Delete any example resources you no longer need, to avoid incurring future costs. To delete the Amazon EKS cluster created as part of this walkthrough, execute the following:

eksctl delete cluster -f ./eks-cluster.yaml

Don’t forget to clean-up any other resources you created for this walkthrough.

Conclusion

In this post, we showed how you can take advantage of intra-VPC communication across multiple AWS Outposts to deploy an Amazon EKS cluster data plane. With intra-VPC communication across multiple Outposts feature, you can leverage a single VPC architecture for communication between resources across AWS Outposts, using the AWS Outposts local gateway (LGW) and Direct VPC routing.

Containers