AWS Deep Learning Base GPU AMI (Ubuntu 22.04)
Created On: April 29, 2024
Last Updated: February 19, 2025
For help getting started, please see the AWS Deep Learning AMI Developer Guide.
AMI Name format:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) ${YYYY-MM-DD}
Supported EC2 Instances:
- Please refer to Important changes to DLAMI
- Deep Learning with OSS Nvidia Driver supports G4dn, G5, G6, Gr6, G6e, P4d, P4de, P5, P5e.
The AMI includes the following:
- Supported AWS Service: EC2
- Operating System: Ubuntu 22.04
- Compute Architecture: x86
- Latest available version is installed for the following packages:
- Linux Kernel: 6.8
- FSx Lustre
- Docker
- AWS CLI v2 at /usr/local/bin/aws2 and AWS CLI v1 at /usr/bin/aws
- NVIDIA DCGM
- Nvidia container toolkit:
- Version command: nvidia-container-cli -V
- Nvidia-docker2:
- Version command: nvidia-docker version
- NVIDIA Driver: 550.144.03
- NVIDIA CUDA12.3-12.6 stack:
- CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-xx.x/
- Example: /usr/local/cuda-12.4/ , /usr/local/cuda-12.4/
- Compiled NCCL Version: 2.22.3
- Default CUDA: 12.4
- PATH /usr/local/cuda points to CUDA 12.4
- Updated below env vars:
- LD_LIBRARY_PATH to have /usr/local/cuda-12.4/lib:/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4:/usr/local/cuda-12.4/targets/sbsa-linux/lib:/usr/local/cuda-12.4/nvvm/lib64:/usr/local/cuda-12.4/extras/CUPTI/lib64
- PATH to have /usr/local/cuda-12.4/bin/:/usr/local/cuda-12.4/include/
- For any different CUDA version, please update LD_LIBRARY_PATH accordingly.
- CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-xx.x/
- EFA Installer: 1.38.0
- Nvidia GDRCopy: 2.4
- AWS OFI NCCL: 1.13.2-aws
- Installation path: /opt/amazon/ofi-nccl/ . Path /opt/amazon/ofi-nccl/lib is added to LD_LIBRARY_PATH.
- AWS CLI v2 at /usr/local/bin/aws2 and AWS CLI v1 at /usr/bin/aws
- EBS volume type: gp3
- Python: /usr/bin/python3.10
- NVMe Instance Store Location (On Supported EC2 Instances): /opt/dlami/nvme
- Query AMI-ID with SSM Parameter (example region is us-east-1):
- OSS Nvidia Driver:
- aws ssm get-parameter --name /aws/service/deeplearning/ami/x86_64/base-oss-nvidia-driver-gpu-ubuntu-22.04/latest/ami-id --region us-east-1 --region us-east-1 --query "Parameter.Value" --output text
- OSS Nvidia Driver:
- Query AMI-ID with AWSCLI (example region is us-east-1):
- OSS Nvidia Driver:
- aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
- OSS Nvidia Driver:
Notice
NVIDIA Container Toolkit 1.17.4
In Container Toolkit version 1.17.4 the mounting of CUDA compat libraries is now disabled. In order to ensure compatibility with multiple CUDA versions on container workflows, please ensure you update your LD_LIBRARY_PATH to include your CUDA compatibility libraries as shown in under the “If you use a CUDA compatibility layer” tutorial here - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-gpu-drivers.html#collapsible-cuda-compat
EFA Updates from 1.37 to 1.38 (Release on 2025-01-31)
EFA now bundles the AWS OFI NCCL plugin, which can now be found in /opt/amazon/ofi-nccl rather than the original /opt/aws-ofi-nccl/. If updating your LD_LIBRARY_PATH variable, please ensure that you modify your OFI NCCL location properly.
Multi ENI Support
- Ubuntu 22.04 automatically sets up and configures source routing on multiple NIC’s via cloud-init on its initial boot. If your workflow includes attaching/detaching your ENI’s while an instance is stopped, an additional configuration must be added to the cloud-init user data to ensure proper configuration of the NIC’s during these events. A sample of the cloud config is provided below.
- Please reference this Canonical documentation here for more information on how to configure the cloud config for your instances - https://documentation.ubuntu.com/aws/en/latest/aws-how-to/instances/automatically-setup-multiple-nics/
#cloud-config # apply network config on every boot and hotplug event updates: network: when: ['boot', 'hotplug']
Support policy
These AMIs Components of this AMI like CUDA versions may be removed and changed based on framework support policy or to optimize performance for deep learning containers or to reduce AMI size in a future release , without prior notice. We remove CUDA versions from AMIs if they are not used by any supported framework version.
EC2 Instances with Multiple Network Cards:
- Many instances types that support EFA also have multiple network cards.
- DeviceIndex is unique to each network card, and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex is 0 or 1.
- For the primary network interface (network card index 0, device index 0), create an EFA (EFA with ENA) interface. You can't use an EFA-only network interface as the primary network interface.
- For each additional network interface, use the next unused network card index, device index 1, and either an EFA (EFA with ENA) or EFA-only network interface, depending on your usecase, such as ENA bandwidth requirements or IP address space. For example use cases, see EFA configuration for a P5 instances.
- For more information please reference the EFA Guide here.
P5/P5e Instances:
- P5 and P5e instances contain 32 network interface cards, and can be launched using the following AWS CLI command:
aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"
P5en:
- P5en contain 16 network interface cards, and can be launched using the following AWS CLI command:
aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=15,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"
Kernel:
- Kernel version is pinned using command:
-
- echo linux-aws hold | sudo dpkg —set-selections
- echo linux-headers-aws hold | sudo dpkg —set-selections
- echo linux-image-aws hold | sudo dpkg —set-selections
-
- We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions:
-
- echo linux-aws install | sudo dpkg -set-selections
- echo linux-headers-aws install | sudo dpkg -set-selections
- echo linux-image-aws install | sudo dpkg -set-selections
-
- For each new version of DLAMI, latest available compatible kernel is used.
Release Date: 2025-02-17
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20250214
Updated
- Updated NVIDIA Container Toolkit from version 1.17.3 to version 1.17.4
- Please see the release notes page here for more information: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.4
- In Container Toolkit version 1.17.4, the mounting of CUDA compat libraries is now disabled. In order to ensure compatibility with multiple CUDA versions on container workflows, please ensure you update your LD_LIBRARY_PATH to include your CUDA compatibility libraries as shown in under the “If you use a CUDA compatibility layer” tutorial here - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-gpu-drivers.html#collapsible-cuda-compat
Removed
- Removed user space libraries cuobj and nvdisasm provided by NVIDIA CUDA toolkit to address CVE’s present in the NVIDIA CUDA Toolkit Security Bulletin for February 18, 2025
Release Date: 2025-02-07
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20250205
Added
- Added CUDA toolkit version 12.6 in directory /usr/local/cuda-12.6
Removed
- CUDA versions 12.1 and 12.2 has been removed from this DLAMI. Customers can install these versions from NVIDIA using the link below
Release Date: 2025-01-31
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20250131
Updated
- Upgraded EFA version from 1.37.0 to 1.38.0
- EFA now bundles the AWS OFI NCCL plugin, which can now be found in /opt/amazon/ofi-nccl rather than the original /opt/aws-ofi-nccl/. If updating your LD_LIBRARY_PATH variable, please ensure that you modify your OFI NCCL location properly.
- Upgraded Nvidia Container Toolkit from 1.17.3 to 1.17.4
Release Date: 2025-01-17
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20250117
Updated
- Upgraded Nvidia driver from version 550.127.05 to 550.144.03 to address CVE’s present in the NVIDIA GPU Display Driver Security Bulletin for January 2025
Release Date: 2024-11-18
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20241115
Added
- Amazon FSx package for Lustre support added.
Fixed
- Due to a change in the Ubuntu kernel to address a defects in the Kernel Address Space Layout Randomization (KASLR) functionality, G4Dn/G5 instances are unable to properly initialize CUDA on the OSS Nvidia driver. In order to mitigate this issue, this DLAMI includes functionality that dynamically loads the proprietary driver for G4Dn and G5 instances. Please allow a brief initialization period for this loading in order to ensure that your instances are able to work properly.
- To check the status and health of this service, you can use the following commands:
$ sudo systemctl is-active dynamic_driver_load.service active
Release Date: 2024-10-23
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20241023
Updated
- Upgraded Nvidia driver from version 550.90.07 to 550.127.05 to address CVE’s present in the NVIDIA GPU Display Security Bulletin for October 2024
Release Date: 2024-10-01
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240930
Updated
- Upgraded Nvidia driver and Fabric Manager from version 535.183.01 to 550.90.07
- Upgraded EFA Version from 1.32.0 to 1.34.0
- Upgraded NCCL to latest version 2.22.3 for all CUDA versions
- CUDA 12.1, 12.2 upgraded from 2.18.5+CUDA12.2
- CUDA 12.3 upgraded from version 2.21.5+CUDA12.4
Added
- Added CUDA toolkit version 12.4 in directory /usr/local/cuda-12.4
- Added support for P5e EC2 Instance.
Release Date: 2024-08-19
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20240816
Added
- Added support for G6e EC2 instance.
Release Date: 2024-06-06
AMI Name: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20240606
Updated
- Updated Nvidia driver version to 535.183.01 from 535.161.08
Release Date: 2024-05-15
AMI Name: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20240513
Removed
- Amazon FSx for Lustre support has been removed in this release due to incompatibility with the latest Ubuntu 22.04 kernel versions. Support for FSx for Lustre will be reinstated once the latest kernel version is supported. Customers who require FSx for Lustre should continue to use the Deep Learning Base GPU AMI (Ubuntu 20.04).
Release Date: 2024-04-29
AMI Name: Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20240429
Added
- Initial release of the Deep Learning Base OSS DLAMI for Ubuntu 22.04