Containers

Amazon EKS optimized Amazon Linux 2023 accelerated AMIs now available

Introduction

Earlier this year we announced support for Amazon EKS optimized AL2023 AMIs that provided many enhancements in terms of security and performance. Amazon Linux 2023 (AL2023) is the next generation of Amazon Linux from Amazon Web Services (AWS) and is designed to provide a secure, stable, and high-performance environment to develop and run your cloud applications.

Today, we’re announcing the general availability of the accelerated variants of the Amazon Elastic Kubernetes Service (Amazon EKS)-optimized AL2023 AMIs that support Amazon Elastic Compute Cloud (Amazon EC2) instances backed with AWS Neuron devices or NVIDIA GPUs. These new accelerated variants are optimized for running machine learning (ML) and high-performance computing workloads using the latest accelerator devices from Annapurna Labs and NVIDIA. The accelerated Amazon EKS-optimized AL2023 Amazon Machine Images (AMIs) can be used with Karpenter, managed node groups (MNG), and self-managed nodes in all AWS Regions. They can be used on Amazon EKS versions 1.28 or greater in standard support and Amazon EKS versions 1.24 through 1.27 in extended support. The accelerated Amazon EKS-optimized AL2023 AMIs support x86/AMD64 architectures. ARM64 isn’t supported at this time.

What is changing?

Although there are security and performance benefits, there are also several package changes, and we recommend that you test applications thoroughly before upgrading applications in production environments. For a list of all package changes in AL2023, refer to Package changes in Amazon Linux 2023. In addition to the changes in AL2023, and the previously noted changes in the Amazon EKS-optimized AL2023 AMI announcement, you should be aware of the following:

  • There are now two separate AMI variants: one for AWS Neuron devices and one for NVIDIA GPUs.
  • User space libraries, such as those provided by NVIDIA CUDA toolkit and libfabric, are no longer supplied on the AMIs.

The existing Amazon EKS-optimized AL2 GPU AMI (AL2_x86_64_GPU) provides support for both AWS Neuron devices and NVIDIA GPUs in the same AMI. Starting with the accelerated AL2023 AMIs, there are now two discrete AMI variants that are optimized specifically for their respective accelerated devices.

CUDA components in accelerated Amazon EKS AL2023 AMIs

The accelerated Amazon EKS AL2023 AMIs provide the necessary drivers and kernel modules for:

  • NVIDIA GPUs: NVIDIA variant only
  • AWS Neuron devices: AWS Neuron variant only
  • Elastic Fabric Adapter (EFA): both NVIDIA and AWS Neuron variants

However, users must include user space libraries such as CUDA (Compute Unified Device Architecture) within their application container as part of the application’s dependencies.

Regarding CUDA:

  • The accelerated Amazon EKS AL2023 NVIDIA AMI provides the driver components (shown in the grey box in Figure 1 of CUDA Compatibility).
  • Users provide the necessary application and toolkit components (shown outside the grey box in Figure 1 of CUDA Compatibility) as part of their application container image.

When running nvidia-smi, the displayed CUDA version is the version of the CUDA driver (libcuda.so) installed in the AMI. The CUDA version that is installed within application containers is the version of the CUDA runtime (libcudart.so).

Open source AMI build configuration

As part of the launch, we are also open sourcing the build configurations used to create the accelerated Amazon EKS-optimized AL2023 AMIs. Providing the build configurations as open-source software allows customers to view the components and configurations that go into creating the AMIs while removing guesswork for those who need to create a custom AMI. Refer to the amazon-eks-ami documentation on how to build a custom accelerated AMI using the provided open source configuration when necessary.

NVIDIA GPU driver versions

At launch, the NVIDIA AMI variant supplies NVIDIA data center driver version 560, which is NVIDIA’s first release with AL2023 support. This driver version supports the Amazon EC2 accelerated instance types that provide NVIDIA GPUs.

nvidia-smi command output from p5e.48xlarge EC2 instance created from the accelerated Amazon EKS AL2023 NVIDIA AMI showing NVIDIA driver version 560 and x8 H200 NVIDIA GPUs

Using the accelerated Amazon EKS AL2023 AMIs

The recommended AMI ID can be retrieved from an Amazon EKS provided AWS Systems Manager parameter using the Region and Kubernetes version of the cluster that the instances will join. At launch, the accelerated AL2023 AMI is available in two variants: NVIDIA x86 and Neuron x86.

AL2023_x86_64_NEURON: 'amazon-linux-2023/x86_64/neuron'
AL2023_x86_64_NVIDIA: 'amazon-linux-2023/x86_64/nvidia'

You can execute the following AWS Command Line Interface (AWS CLI) command to retrieve the appropriate accelerated AL2023 AMI ID. Replace the Region and Kubernetes version as appropriate. You must be logged in to the AWS CLI using an IAM principal that has the ssm:GetParameter AWS Identity and Access Management (IAM) permission to retrieve the Amazon EKS-optimized AMI metadata

aws ssm get-parameter \
   --name /aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/neuron/recommended/image_id \
   --region us-west-2 \
   --query "Parameter.Value" \
   --output text

EKS MNG

You can create a new MNG using the CreateNodeGroup Amazon EKS API and specify the AMI type, either AL2023_x86_64_NEURON or AL2023_ARM_64_NVIDIA. The new node group will be created with the latest accelerated AL2023 AMI release version. If you want to use a specific AMI release version, then you can specify the releaseVersion using a release version listed in the amazon-eks-ami GitHub repository releases.

Using eksctl, you can create a node group using the new accelerated AL2023 AMIs by specifying the --node-ami-family as AmazonLinux2023. The combination of the –node-type and –node-ami-family will result in the correct selection of the respective accelerated AMI variant. For example, the following command would create a node group that uses the AL2023_x86_64_NVIDIA AMI type:

eksctl create nodegroup \
  --cluster my-cluster \
  --region region-code \
  --name nvidia-mng \
  --node-ami-family AmazonLinux2023 \
  --node-type p5.48xlarge \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 4

The following command would create a node group that uses the AL2023_x86_64_NEURON AMI type:

eksctl create nodegroup \
  --cluster my-cluster \
  --region region-code \
  --name aws-neuron-mng \
  --node-ami-family AmazonLinux2023 \
  --node-type trn1.32xlarge \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 4

If you are using the Amazon EKS console to create a new MNG, then you can select Amazon Linux 2023 Neuron or NVIDIA from the dropdown menu for AMI type, as shown in the following figure:

AMI type dropdown selection choices for EKS-optimized Amazon Machine Image for nodes highlighting the new accelerated Amazon EKS AL2023 AMIs

If you have an existing MNG, then you can upgrade to AL2023 by either performing an in-place upgrade or a blue/green upgrade depending on how you are using a launch template.

If you’re using a custom AMI with an MNG and you’re specifying the AMI ID, then you can perform an in-place upgrade by updating the AMI ID in the launch template. You should make sure that your applications and user data transfer over to AL2023 first before performing this upgrade strategy. Refer to the Amazon EKS documentation for updating a MNG for further details.

If you’re using a MNG with either the standard launch template or with a custom launch template that doesn’t specify the AMI ID, then you must upgrade using a blue/green strategy because at this time, you cannot edit the amiType in the MNG. A blue/green upgrade is an alternative strategy that is more involved because a new node group is created with AL2023 as the AMI type. You must make sure that the new node group is carefully configured so that the custom user data from the AL2 node group is compatible with the AL2023 variant. When the new node group is ready, nodes in the old node group can be cordoned and drained so that pods are scheduled on the new node group. For more on custom user data, see Customize managed nodes with launch templates

Karpenter

Karpenter users who want to use AL2023 should modify the EC2NodeClass amiSelectorTerms to use AL2023. By default, Drift is enabled in Karpenter. This means that when the amiSelectorTerms field has been changed, Karpenter detects that the Karpenter-provisioned nodes are using Amazon EKS-optimized AMIs for the old AMI. Karpenter will then automatically cordon, drain, and replace those nodes with the new AMI.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    nodeClassRef:
      group: karpenter.k8s.aws
      kind: EC2NodeClass
      name: default
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    # Required; when coupled with a pod that requests NVIDIA GPUs or AWS Neuron
    # devices, Karpenter will select the correct AL2023 accelerated AMI variant
    - alias: al2023@latest

Conclusion

The accelerated Amazon EKS-optimized AL2023 AMI helps you improve the performance and security posture of your applications and is available today for MNG, Karpenter, and self-managed nodes. You can also customize your accelerated Amazon EKS-optimized AL2023 AMIs by using packer build steps listed in the amazon-eks-ami GitHub repo. To learn more about using Amazon Linux 2023 with Amazon EKS, see Amazon EKS-optimized Amazon Linux AMIs.