Bottlerocket support for NVIDIA GPUs
Today, we are happy to announce that Bottlerocket, a Linux-based, open-source, container-optimized operating system, now supports NVIDIA GPUs for accelerated computing workloads. You can now use NVIDIA-based Amazon Elastic Compute Cloud (EC2) instance types with Bottlerocket to accelerate your machine learning (ML), artificial intelligence (AI), and similar workloads that require GPU compute devices.
This release includes a new NVIDIA Bottlerocket variant for Amazon Elastic Kubernetes Service (Amazon EKS). The variant comes with GPU drivers pre-installed and configured for the containered runtime. You don’t have to install or configure the GPU driver, or run the k8s-device-plugin, because all of the libraries and kernel modules are already available inside the image. By including the driver directly in the AMI, you can speed up the provisioning time of a GPU-based EC2 instance, avoid external dependencies, and reduce errors for device and kernel compatibility.
The NVIDIA Bottlerocket variant supports self-managed node groups on Amazon EKS and Karpenter node auto scaler. You can use the provided AMIs with custom provisioning tools or community tools, such as kops, for any Kubernetes cluster using EC2 instances.
Let’s see how you can make your first Amazon EKS cluster with NVIDIA GPU instances using Bottlerocket for the node operating system.
Create a cluster
We’ll use eksctl—the official Amazon EKS command line interface—to create our example cluster. You need to be using eksctl 0.86.0 or newer to use the new Bottlerocket NVIDIA variant.
In this example, we will create a cluster called “br-gpu” with an Amazon EC2 G4dn node group powered by NVIDIA T4 Tensor Core GPUs. We use the Bottlerocket AMI family, which will automatically use the correct Bottlerocket variant for the GPU instance type. The NVIDIA Bottlerocket variant supports both x86_64 and arm64 instance types. Make sure your containers are built for the correct architecture you’ll be using.
Once the cluster is created you can see the instances in the cluster with:
Deploy a GPU-accelerated workload
Now that we have a node with a GPU attached, we can deploy our first GPU workload and attach the GPU resource to the container.
This pod is being used to run the
nvidia-smi command so we can see which NVIDIA GPUs are available inside the pod.
You can see the output with:
You’ll notice that the 470.X driver and an NVIDIA T4 GPU are available to the container.
You are now ready to run any of your GPU-accelerated workloads on Kubernetes with Bottlerocket. If you’re looking for more example workloads, you can check out the NVIDIA GPU-optimized containers in the NVIDIA NGC Catalog on AWS Marketplace.
Delete the cluster
To delete the cluster and all provisioned EC2 instances you can run this command:
eksctl delete cluster --name br-gpu
Now is a great time to use Bottlerocket with your AI/ML workloads. The new Bottlerocket NVIDIA variant helps you run GPU-accelerated workloads quickly and securely. A minimal operating system that includes the required drivers and libraries reduces configuration and compatibility issues. Integrated drivers also provide seamless operating system updates and improves provisioning time.
We will support Amazon EKS managed node groups and Amazon Elastic Container Service (ECS) in a future update. We want to hear your feedback on use cases with Bottlerocket and NVIDIA GPUs. Let us know what workloads you would like to run and how Bottlerocket can help secure them in the Bottlerocket GitHub repo.