Containers

Running Windows workloads on a private EKS cluster

Legacy applications in the automotive industry tend to run on Windows. Customers want to scale these workloads on Kubernetes alongside their Linux workloads. The automotive industry has a particularly high standard on security, and an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with private endpoint is applicable to run their workloads.

This blog post shows how a Windows worker node can be added to an already existing Amazon EKS cluster and how to do an end-to-end test to ensure that Windows container workloads can be successfully executed. This blog post focuses on how a manual bootstrap of a worker node works so that this approach can be adapted for an infrastructure as code (IaC) deployment. The blog post will also show how to use the Terraform EKS module to add a worker node to an Amazon EKS cluster.

Additionally, a repository implementing a complete private Amazon EKS setup with Windows and Linux worker nodes in Terraform is available as part of the AWS-Samples repository.

After reading this blog post, you will be able to migrate and run your Windows workloads on an EKS cluster with nodes that are communicating to the control plane via a private endpoint.

Overview of solution

This image shows architecure diagram of a private EKS cluster with Windows and Linux worker nodes that can be accessed by a bastion host in a public VPC via VPC peering

Walkthrough

In this blog post, we will show how to do the following:

  • Enable Windows support in an EKS cluster.
  • Create the necessary EC2 user data script for the Windows worker node.
  • Allow Windows worker nodes to interact with the cluster internal networking.
  • Launch a Windows EC2 instance with the user data script.
  • Add necessary tags to the EC2 instance.
  • Perform end-to-end tests on a Windows Pod.
  • Add Windows support for Amazon EKS in Terraform and add a Windows node group.

Prerequisites

The following are prerequisites to follow this walkthrough:

Tip: We recommend creating and using a new VPC and Amazon EKS cluster to complete this tutorial. Procedures like modifying the aws-auth ConfigMap on the cluster may impact other nodes, users, and workloads on the cluster. Using a new VPC and Amazon EKS cluster will help minimize the risk of impacting other applications as you complete the tutorial.

An Amazon EKS cluster that is deployed via Terraform and already fulfills all these requirements is available as part of the AWS-examples repository: private-eks-for-windows-workloads-with-terraform

Enable Windows support in the EKS cluster

Windows support for an existing EKS cluster needs to be enabled by following how to enable Windows support on an EKS cluster. This is mandatory, and the necessary steps are dependent on the platform version of the EKS cluster, which is shown in the documentation. This blog post uses the new Windows support implementation, which is mostly integrated into the Control Plane. It is recommended to use an EKS cluster with at least platform version eks.3 for 1.21.

Create the necessary EC2 user data script for the Windows worker node

The EKS optimized Windows AMI that is provided by AWS includes scripts and executables that are needed to join the nodes to the EKS cluster once they are ready.

This blog post will use the Amazon EKS optimized Windows server core edition.

C:\Program Files\Amazon\EKS\Start-EKSBootstrap.ps1
C:\Program Files\Kubernetes\kubelet.exe
C:\Program Files\Kubernetes\kube-proxy.exe

The file Start-EKSBootstrap.ps1 starts and configures the necessary local components like kubelet and kube-proxy to join the given cluster.

The following command can be used on a node to join the EKS cluster:

./Start-EKSBootstrap.ps1 -EKSClusterName "<EKS cluster name>" -APIServerEndpoint "<EKS API endpoint>" -Base64ClusterCA "<Certificate data>"

EKS cluster name: Name of your EKS cluster

EKS API endpoint: Private endpoint of your EKS cluster

Certificate data: The cluster base64 encoded certificate authority.

The bootstrap script will create a kubelet and kube-proxy Windows service on the node and start both with the provided configuration.

Get EKS API endpoint and certificate data (console)

  1. Open the EKS console.
  2. In the navigation pane, under Amazon EKS, choose Clusters.
  3. Open the cluster in which the Windows node should be added.
  4. On the navigation bar, choose Configuration.
  5. In the subnavigation bar, choose Details.
  6. Make note of the value of the API server endpoint and certificate authority.

Get EKS API endpoint (AWS CLI)

  1. Open a command line on your workstation or bastion host that has access to your AWS account running your EKS cluster.
  2. Perform the command:
    aws eks describe-cluster --region <cluster region> --name <EKS cluster name> --query "cluster.endpoint"
  3. Make note of the value that is returned.

Get EKS certificate data (AWS CLI)

  1. Open a command line on your workstation or bastion host that has access to your AWS account running your EKS cluster.
  2. Perform the command:
    aws eks describe-cluster --region <cluster region> --name <EKS cluster name> --query "cluster.certificateAuthority"
  3. The response will have the following format:
    {
         "data": "<certificate data>"
    }
  4. Make note of the value that is at the place of certificate data.

The following is the complete user data content for the EC2 instance bootstrap:

<powershell> 
cd '\Program Files\Amazon\EKS'
./Start-EKSBootstrap.ps1 -EKSClusterName '<EKS cluster name>' -APIServerEndpoint '<EKS API endpoint>' -Base64ClusterCA '<Certificate data>' 
</powershell>

Insert the values for <EKS API endpoint> & <Certificate data> with the values that were collected in the previous steps. Note the User data content down as it will be used in one of the next steps.

Allow Windows worker nodes to interact with the cluster internal networking

The roles that are attached to the nodes, Linux and Windows alike, need to be allowed to access the resources required by the kube-proxy component. They need to have access to the cluster role system:node-proxier (see Core component roles for details). Such a cluster role binding already exists in an EKS cluster: eks:kube-proxy-windows.

Execute the following command to see the details of the cluster role binding:

kubectl get clusterrolebinding eks:kube-proxy-windows -o yaml

In order to allow your Windows worker nodes to access the necessary networking components, add the following to the end of the aws-auth ConfigMap as part of the mapRoles section in the kube-system namespace:

- rolearn: arn:aws:iam::444455556666:role/Admin
  username: system:node:{{EC2PrivateDNSName}}
  groups:
    - system:bootstrappers
    - system:nodes
    - eks:kube-proxy-windows

Replace the rolearn with your ARN for the node role. See Enabling IAM user and role access to your cluster for details about the aws-auth ConfigMap.

You can also download an example and modify it accordingly:

curl -o aws-auth-cm-windows.yaml https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-10-29/aws-auth-cm-windows.yaml

However, this example does not include any additional roles or users that were already defined as part of your cluster setup.

Create a Windows EC2 instance with the user script:

  1. Open the Amazon EC2 console.
  2. In the navigation pane, under Instances, choose Instances.
  3. Open Launch Instances.
  4. Search for Windows_Server-2019-English-Core-EKS_Optimized.
  5. In the navigation pane, choose Community AMIs.
  6. Select an AMI that fits your Kubernetes version. For Kubernetes version 1.21, search for Windows_Server-2019-English-Core-EKS_Optimized-1.21.
  7. Select your preferred instance type, for example, t3.xlarge. Refer to Windows support for unsupported instance types.
  8. Choose Configure Instance Details.
  9. For Network, choose the private VPC that your EKS cluster is running in.
  10. For Subnet, choose a private subnet of your EKS cluster.
  11. For IAM role, choose the role that is already used by the Linux worker nodes.
  12. Scroll down to Advanced Details.
  13. For User data, enter the user data script from the last step.
  14. Choose Add Storage.
  15. Choose Add Tags.
  16. Choose Add Tag.
  17. For Key, enter kubernetes.io/cluster/<cluster-name> but replace <cluster-name> with the name of your cluster.
  18. For Value, enter owned.
  19. Choose Configure Security Group.
  20. For Assign a security group, choose Create a new security group.
  21. Select Type RDP and provide the bastion host security group ID as Source.
  22. Choose Add rule.
  23. Select All Traffic and provide the security group ID of the Linux worker nodes as Source.
  24. Choose Add rule.
  25. Use Port 10250 and provide the security group ID of the EKS cluster as Source.
  26. Choose Add rule.
  27. Use Port https and provide the security group ID of the EKS cluster as Source.
  28. Choose Review and Launch.
  29. After reviewing the configuration, choose Launch.
  30. Provide either an existing key or create a new key pair.

Make sure that the security groups attached to the Windows and Linux worker nodes allow all traffic between them for node-to-node communication. Additionally, make sure that the security groups of the EKS cluster allow https from the Windows security group. See Amazon EKS security group considerations for details.

The tag is necessary as additional information for the bootstrap script and is accessed via the EC2 instance metadata.

After a few minutes, the instance should be started and should show up as part of the EKS cluster:

kubectl get nodes

Perform end-to-end tests for example Windows workload

Review Deploy a sample application to deploy a Windows workload.

Modify example Windows workload to be able to execute in private EKS environment

  1. Download the file eks-sample-deployment, which is already prepared to be executed in a private endpoint EKS cluster.
  2. Deploy the sample Windows workload.
  3. Deploy the sample Windows service as provided by Deploy a sample application.

This creates a simple webserver without any dependency on a public endpoint and is derived from Webserver.yaml.

For the Windows AMI version Windows_Server-2019-English-Core-EKS_Optimized-*, the image used by the sample workload is already available on the node. If another Windows server version was used, make sure to replace the value “image” inside of the .yaml definition with the correct tag of the correct kernel version.

Once the workloads and the services run successfully on the cluster, execute the following commands in order to test the setup.

Test internal webserver from Windows Pod:

  1. Open a command line on the bastion host.
  2. Execute the following command on the bastion host.
    kubectl exec -it -n eks-sample-app <windows-pod> powershell
  3. Execute the following commands in the PowerShell of the Windows Pod that just opened.
    curl eks-sample-windows-service -UseBasicParsing

    1. The request should return a response from the webserver. The Windows webserver needs a few seconds to respond the first time.

The execution of these workloads validates the following:

  • DNS resolution works as expected inside of a Windows Pod.
  • Networking between different subnets and Pods (DNS) works as expected.
  • Workloads get scheduled on the correct node.

Adding Windows support for EKS in Terraform and adding a Windows node group:

The following code-snippets are using version 18.6.0 of the Terraform eks module and they automate the steps that were manually done in this blog post. For a complete working example, refer to the aws-samples repository.

Enable Windows Support for EKS cluster

This code enables the EKS cluster to run Windows workloads while using the files additional_roles_aws_auth.yaml and vpc-resource-controller-configmap.yaml:

### Prerequisites for Windows worker node enablement
data "aws_eks_cluster_auth" "this" {
  name = module.eks.cluster_id
}

locals {
  kubeconfig = yamlencode({
    apiVersion      = "v1"
    kind            = "Config"
    current-context = "terraform"
    clusters = [{
      name = module.eks.cluster_id
      cluster = {
        certificate-authority-data = module.eks.cluster_certificate_authority_data
        server                     = module.eks.cluster_endpoint
      }
    }]
    contexts = [{
      name = "terraform"
      context = {
        cluster = module.eks.cluster_id
        user    = "terraform"
      }
    }]
    users = [{
      name = "terraform"
      user = {
        token = data.aws_eks_cluster_auth.this.token
      }
    }]
  })
}
### Apply changes to aws_auth
### Windows worker node cluster enablement:  https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html
resource "null_resource" "apply" {
  triggers = {
    kubeconfig = base64encode(local.kubeconfig)
    cmd_patch  = <<-EOT
      kubectl create configmap aws-auth -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)
      kubectl patch configmap/aws-auth --patch "${module.eks.aws_auth_configmap_yaml}" -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)
      kubectl get cm aws-auth -n kube-system -o json --kubeconfig <(echo $KUBECONFIG | base64 --decode) | jq --arg add "`cat yaml-templates/additional_roles_aws_auth.yaml`" '.data.mapRoles += $add' | kubectl apply --kubeconfig <(echo $KUBECONFIG | base64 --decode) -f -
      kubectl apply --kubeconfig <(echo $KUBECONFIG | base64 --decode) -f yaml-templates/vpc-resource-controller-configmap.yaml
    EOT
  }
    provisioner "local-exec" {
    interpreter = ["/bin/bash", "-c"]
    environment = {
      KUBECONFIG = self.triggers.kubeconfig
    }
    command = self.triggers.cmd_patch
  }
}

Dynamically create Windows AMI ID

Amazon EKS optimized Windows AMIs get released frequently with the latest security patches. The following code gets the most current Windows AMI that is available during execution time:

data "aws_ami" "win_ami" {
    most_recent = true
    owners = ["amazon"]
    filter {
        name = "name"
        values = ["Windows_Server-2019-English-Core-EKS_Optimized-${var.eks_cluster_version}-*"]
    }
}

Create self-managed Windows node group within EKS module:

The following code-snippet shows an example of a Windows self_managed_node_group that can be added to an existing EKS module implementation.

module "eks" {
  source       = "terraform-aws-modules/eks/aws"
  version = "18.6.0"
  ...
self_managed_node_groups = {
    windows = {
      platform = "windows"
      name = "windows"
      public_ip    = false
      instance_type = var.instance_type_win
      key_name = "<available-key-in-region>"
      desired_size = var.desired_size_win
      max_size = var.max_size_win
      min_size = var.min_size_win
      ami_id = data.aws_ami.win_ami.id
    }
}
}

The Terraform eks module uses a template of a bootstrap script similar to the one used in this blog post. See windows_user_data.tpl for details.

Cleaning up

To avoid incurring future charges, delete the resources that were created. If you manually added the EC2 instance as described in this blog post, then you can simply delete the EC2 instance. If you created the cluster with the linked Terraform code, then you can use the following command to delete the Terraform managed resources:

terraform destroy -var-file main-input.tfvars

Conclusion

In this blog post, we demonstrated how to run Windows worker nodes on a private EKS cluster. We have shown how to do an admission of a Windows worker node and have provided the necessary user data script for the EC2 instance. Additionally, we have shown how to test the overall EKS setup to validate that the setup is complete and the cluster can be used for actual workloads.

Finally, we have shown the necessary Terraform implementation to enable Windows support and to add a Windows node group. The repository private-eks-for-windows-workloads-with-terraform complements this blog post by providing  a complete Terraform implementation of a private EKS cluster which you can use to get started running your Windows workloads on EKS.