AWS Deep Learning AMI GPU PyTorch 2.5 (Ubuntu 22.04)

Created On: December 04, 2024
Last Updated: February 19, 2025

For help getting started, please see the AWS Deep Learning AMI Developer Guide.

AMI Name format:

Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.${PATCH_VERSION} (Ubuntu 22.04) ${YYYY-MM-DD}

Supported EC2 Instances:

Please refer to Important changes to DLAMI
Deep Learning with OSS Nvidia Driver supports G4dn, G5, G6, Gr6, P4, P4de, P5, P5e, P5en.

The AMI includes the following:

Notice

P5/P5e Instances:

DeviceIndex is unique to each NetworkCard, and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex is 0 or 1. Below is the example of EC2 P5 instance launch command using awscli showing NetworkCardIndex from number 0-31 and DeviceIndex as 0 for first interface and DeviceIndex as 1 for rest 31 interrfaces.

aws ec2 run-instances --region $REGION \
--instance-type $INSTANCETYPE \
--image-id $AMI --key-name $KEYNAME \
--iam-instance-profile "Name=dlami-builder" \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \
--network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \
 "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \
 "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \
 "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \
 "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \
 ....
 ....
 ....
 "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"

Kernel:

Kernel version is pinned using command:
- - echo linux-aws hold | sudo dpkg —set-selections
    echo linux-headers-aws hold | sudo dpkg —set-selections
    echo linux-image-aws hold | sudo dpkg —set-selections
We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions:
- - echo linux-aws install | sudo dpkg —set-selections
    echo linux-headers-aws install | sudo dpkg —set-selections
    echo linux-image-aws install | sudo dpkg —set-selections
  - apt-get upgrade -y
For each new version of DLAMI, latest available compatible kernel is used.

Release Date: 2025-02-17

AMI Names:

Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20250216

Updated

Updated NVIDIA Container Toolkit from version 1.17.3 to version 1.17.4
- Please see the release notes page here for more information: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.4
- In Container Toolkit version 1.17.4, the mounting of CUDA compat libraries is now disabled. In order to ensure compatibility with multiple CUDA versions on container workflows, please ensure you update your LD_LIBRARY_PATH to include your CUDA compatibility libraries as shown in under the “If you use a CUDA compatibility layer” tutorial here - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-gpu-drivers.html#collapsible-cuda-compat

Removed

Removed user space libraries cuobj and nvdisasm provided by NVIDIA CUDA toolkit to address CVE’s present in the NVIDIA CUDA Toolkit Security Bulletin for February 18, 2025

Release Date: 2025-01-21

AMI Names:

Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20250119

Updated

Upgraded Nvidia driver from version 550.127.05 to 550.144.03 to address CVE’s present in the NVIDIA GPU Display Driver Security Bulletin for January 2025.

Release Date: 2024-11-21

AMI Names:

Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20241121

Added

Initial release of Deep Learning AMI GPU PyTorch 2.4.1 (Ubuntu 22.04) series. Including a conda environment pytorch complimented with NVIDIA Driver R550, CUDA=12.4.1, cuDNN=8.9.7, PyTorch NCCL=2.21.5, and EFA=1.37.0.

Fixed

Due to a change in the Ubuntu kernel to address a defects in the Kernel Address Space Layout Randomization (KASLR) functionality, G4Dn/G5 instances are unable to properly initialize CUDA on the OSS Nvidia driver. In order to mitigate this issue, this DLAMI includes functionality that dynamically loads the proprietary driver for G4Dn and G5 instances. Please allow a brief initialization period for this loading in order to ensure that your instances are able to work properly.
- To check the status and health of this service, you can use the following commands:

$ sudo systemctl is-active dynamic_driver_load.service
active

Select your cookie preferences

AWS Deep Learning AMI GPU PyTorch 2.5 (Ubuntu 22.04)

For help getting started, please see the AWS Deep Learning AMI Developer Guide.

AMI Name format:

Supported EC2 Instances:

The AMI includes the following:

Notice

Release Date: 2025-02-17

AMI Names:

Updated

Removed

Release Date: 2025-01-21

Updated

Release Date: 2024-11-21

AMI Names:

Added

Fixed

Ending Support for Internet Explorer