AWS Deep Learning Base GPU AMI (Ubuntu 20.04)
Release Date: February 01, 2024
Created On: April 18, 2023
Last Updated: February 19, 2025
For help getting started, please see the AWS Deep Learning AMI Developer Guide.
AMI Name format:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) ${YYYY-MM-DD}
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) ${YYYY-MM-DD}
Supported EC2 Instances:
- Please refer to Important changes to DLAMI
- Deep Learning with OSS Nvidia Driver supports G4dn, G5, G6, Gr6, G6e, P4d, P4de, P5, P5e, P5en
- Deep Learning with Proprietary Nvidia Driver supports G3 (G3.16x not supported), P3, P3dn
The AMI includes the following:
- Supported AWS Service: EC2
- Operating System: Ubuntu 20.04
- Compute Architecture: x86
- Latest available version is installed for the following packages:
- Linux Kernel 5.15
- FSx Lustre
- Docker
- AWS CLI v2 at /usr/local/bin/aws2 and AWS CLI v1 at /usr/bin/aws
- NVIDIA DCGM
- Nvidia container toolkit:
- Version command: nvidia-container-cli -V
- Nvidia-docker2:
- Version command: nvidia-docker version
- NVIDIA Driver:
- OSS Nvidia driver: 550.144.03
- Proprietary Nvidia driver: 550.144.03
- NVIDIA CUDA11.7, 12.1-12.4 stack:
- CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-xx.x/
- Example: /usr/local/cuda-12.1/
- Compiled NCCL Version: 2.22.3+CUDA12.4
- Default CUDA: 12.1
- PATH /usr/local/cuda points to CUDA 12.1
- Updated below env vars:
- LD_LIBRARY_PATH to have /usr/local/cuda-12.1/lib:/usr/local/cuda-12.1/lib64:/usr/local/cuda-12.1:/usr/local/cuda-12.1/targets/x86_64-linux/lib
- PATH to have /usr/local/cuda-12.1/bin/:/usr/local/cuda-12.1/include/
- For any different CUDA version, please update LD_LIBRARY_PATH accordingly.
- NCCL Tests Location:
- all_reduce, all_gather and reduce_scatter: /usr/local/cuda-xx.x/efa/test-cuda-xx.x/
- To run NCCL tests, LD_LIBRARY_PATH needs to passed having below updates.
- Common PATHs are already added to LD_LIBRARY_PATH:
- /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/lib:/usr/lib
- For any different CUDA version, please update LD_LIBRARY_PATH accordingly.
- Common PATHs are already added to LD_LIBRARY_PATH:
- CUDA, NCCL and cuDDN installation directories: /usr/local/cuda-xx.x/
- EFA Installer: 1.38.0
- Nvidia GDRCopy: 2.4
- AWS OFI NCCL: 1.13.2-aws
- AWS OFI NCCL now supports multiple NCCL versions with single build
- Installation path: /opt/aws-ofi-nccl/ . Path /opt/aws-ofi-nccl/lib is added to LD_LIBRARY_PATH.
- Tests path for ring, message_transfer: /opt/aws-ofi-nccl/tests
- EBS volume type: gp3
- Python: /usr/bin/python3.9
- NVMe Instance Store Location (On Supported EC2 Instances): /opt/dlami/nvme
- Query AMI-ID with SSM Parameter (example region is us-east-1):
- OSS Nvidia Driver:
- aws ssm get-parameter --name /aws/service/deeplearning/ami/x86_64/base-oss-nvidia-driver-gpu-ubuntu-20.04/latest/ami-id --region us-east-1 --region us-east-1 --query "Parameter.Value" --output text
- Proprietary Nvidia Driver:
- aws ssm get-parameter --name /aws/service/deeplearning/ami/x86_64/base-proprietary-nvidia-driver-gpu-ubuntu-20.04/latest/ami-id --region us-east-1 --region us-east-1 --query "Parameter.Value" --output text
- OSS Nvidia Driver:
- Query AMI-ID with AWSCLI (example region is us-east-1):
- OSS Nvidia Driver:
- aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
- Proprietary Nvidia Driver:
- aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
- OSS Nvidia Driver:
Notices
NVIDIA Container Toolkit 1.17.4
In Container Toolkit version 1.17.4 the mounting of CUDA compat libraries is now disabled. In order to ensure compatibility with multiple CUDA versions on container workflows, please ensure you update your LD_LIBRARY_PATH to include your CUDA compatibility libraries as shown in under the “If you use a CUDA compatibility layer” tutorial here - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-gpu-drivers.html#collapsible-cuda-compa
EFA Updates from 1.37 to 1.38 (Release on 2025-02-04)
EFA now bundles the AWS OFI NCCL plugin, which can now be found in /opt/amazon/ofi-nccl rather than the original /opt/aws-ofi-nccl/. If updating your LD_LIBRARY_PATH variable, please ensure that you modify your OFI NCCL location properly.
Support policy
These AMIs Components of this AMI like CUDA versions may be removed and changed based on framework support policy or to optimize performance for deep learning containers or to reduce AMI size in a future release , without prior notice. We remove CUDA versions from AMIs if they are not used by any supported framework version.
EC2 Instances with Multiple Network Cards:
- Many instances types that support EFA also have multiple network cards.
- DeviceIndex is unique to each network card, and must be a non-negative integer less than the limit of ENIs per NetworkCard. On P5, the number of ENIs per NetworkCard is 2, meaning that the only valid values for DeviceIndex is 0 or 1.
- For the primary network interface (network card index 0, device index 0), create an EFA (EFA with ENA) interface. You can't use an EFA-only network interface as the primary network interface.
- For each additional network interface, use the next unused network card index, device index 1, and either an EFA (EFA with ENA) or EFA-only network interface, depending on your usecase, such as ENA bandwidth requirements or IP address space. For example use cases, see EFA configuration for a P5 instances.
- For more information please reference the EFA Guide here.
P5/P5e Instances:
- P5 and P5e instances contain 32 network interface cards, and can be launched using the following AWS CLI command:
aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=31,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"
P5en:
- P5en contain 16 network interface cards, and can be launched using the following AWS CLI command:
aws ec2 run-instances --region $REGION \ --instance-type $INSTANCETYPE \ --image-id $AMI --key-name $KEYNAME \ --iam-instance-profile "Name=dlami-builder" \ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$TAG}]" \ --network-interfaces "NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ "NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa" \ .... .... .... "NetworkCardIndex=15,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa"
Kernel:
- Kernel version is pinned using command:
-
- echo linux-aws hold | sudo dpkg —set-selections
- echo linux-headers-aws hold | sudo dpkg —set-selections
- echo linux-image-aws hold | sudo dpkg —set-selections
-
- We recommend users to avoid updating their kernel version (unless due to a security patch) to ensure compatibility with installed drivers and package versions. If users still wish to update they can run the following commands to unpin their kernel versions:
-
- echo linux-aws install | sudo dpkg -set-selections
- echo linux-headers-aws install | sudo dpkg -set-selections
- echo linux-image-aws install | sudo dpkg -set-selections
-
- For each new version of DLAMI, latest available compatible kernel is used.
Release Date: 2025-02-17
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20250214
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20250214
Updated
- Updated NVIDIA Container Toolkit from version 1.17.3 to version 1.17.4
- Please see the release notes page here for more information: https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.4
- In Container Toolkit version 1.17.4, the mounting of CUDA compat libraries is now disabled. In order to ensure compatibility with multiple CUDA versions on container workflows, please ensure you update your LD_LIBRARY_PATH to include your CUDA compatibility libraries as shown in under the “If you use a CUDA compatibility layer” tutorial here - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-gpu-drivers.html#collapsible-cuda-compat
Removed
- Removed user space libraries cuobj and nvdisasm provided by NVIDIA CUDA toolkit to address CVE’s present in the NVIDIA CUDA Toolkit Security Bulletin for February 18, 2025
Release Date: 2025-02-04
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20250204
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20250204
Updated
- Upgraded EFA version from 1.37.0 to 1.38.0
- EFA now bundles the AWS OFI NCCL plugin, which can now be found in /opt/amazon/ofi-nccl rather than the original /opt/aws-ofi-nccl/. If updating your LD_LIBRARY_PATH variable, please ensure that you modify your OFI NCCL location properly.
Removed
- The emacs package has been removed from these DLAMIs. Customers can install emacs from GNU emacs https://www.gnu.org/software/emacs/download.html.
Release Date: 2025-01-17
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20250117
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20250117
Updated
- Upgraded Nvidia driver from version 550.127.05 to 550.144.03 to address CVE’s present in the NVIDIA GPU Display Driver Security Bulletin for January 2025
Release Date: 2024-12-09
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20241206
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20241206
Updated
- Upgraded Nvidia Container Toolkit from version 1.17.0 to 1.17.3
Release Date: 2024-11-22
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20241122
Added
- Added support for P5en EC2 Instances.
Updated
- Upgraded EFA Installer from version 1.35.0 to 1.37.0
- Upgrade AWS OFI NCCL Plugin from version 1.12.1-aws to 1.13.0-aws
Release Date: 2024-10-26
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20241025
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20241025
Updated
- Upgraded Nvidia driver from version 550.90.07 to 550.127.05 to address CVE’s present in the NVIDIA GPU Display Security Bulletin for October 2024
Release Date: 2024-10-03
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240927
Updated
- Upgraded Nvidia Container Toolkit from version 1.16.1 to 1.16.2
Release Date: 2024-08-27
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240827
Updated
- Upgraded Nvidia driver and Fabric Manager from version 535.183.01 to 550.90.07
- Upgraded EFA Version from 1.32.0 to 1.34.0
- Upgraded NCCL to latest version 2.22.3 for all CUDA versions
- CUDA 11.7 upgraded from version 2.16.2+CUDA11.7
- CUDA 12.1, 12.2 upgraded from 2.18.5+CUDA12.2
- CUDA 12.3 upgraded from version 2.21.5+CUDA12.4
Added
- Added CUDA toolkit version 12.4 in directory /usr/local/cuda-12.4
- Added support for P5e EC2 Instance.
Removed
- Removed CUDA Toolkit version 11.8 stack present in directory /usr/local/cuda-11.8
Release Date: 2024-08-19
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240816
Added
- Added support for G6e EC2 instance.
Release Date: 2024-06-06
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240606
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240606
Updated
- Updated Nvidia driver version to 535.183.01 from 535.161.08
Release Date: 2024-05-15
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240515
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240515
Added
- Added back CUDA11.7 stack at directory /usr/local/cuda-11.7 with CUDA11.7, NCCL 2.16.2, CuDNN 8.7.0 as PyTorch 1.13 supports CUDA11.7
Release Date: 2024-05-02
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240502
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240502
Updated
- Updated EFA version from version 1.30 to version 1.32
- Updated AWS OFI NCCL plugin from version 1.7.4 to version 1.9.1
- Updated Nvidia container toolkit from version 1.13.5 to version 1.15.0
- Version 1.15.0 does NOT include the nvidia-container-runtime and nvidia-docker2 packages. It is recommended to use nvidia-container-toolkit packages directly by following Nvidia container toolkit docs.
Added
- Added CUDA12.3 stack with CUDA12.3, NCCL 2.21.5, CuDNN 8.9.7
Removed
- Removed CUDA11.7, CUDA12.0 stacks present at /usr/local/cuda-11.7 and /usr/local/cuda-12.0 directories
- Removed nvidia-docker2 package and its command nvidia-docker as part of Nvidia container toolkit update from 1.13.5 to 1.15.0 which does NOT include the nvidia-container-runtime and nvidia-docker2 packages.
Release Date: 2024-04-04
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240404
Added
- For OSS Nvidia driver DLAMIs, added G6 and Gr6 EC2 instances support. Please refer EC2 instance selection page for more information.
Release Date: 2024-03-29
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240326
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240326
Updated
- Updated Nvidia driver from 535.104.12 to 535.161.08 in both Proprietary and OSS Nvidia driver DLAMIs.
- Removed G4dn, G5 EC2 instances support from Proprietary Nvidia driver DLAMI.
- The new supported instances for each DLAMI are as follows:
- Deep Learning with Proprietary Nvidia Driver supports G3 (G3.16x not supported), P3, P3dn
- Deep Learning with OSS Nvidia Driver supports G4dn, G5, P4d, P4de, P5.
Release Date: 2024-03-20
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240318
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240318
Added
- Added awscliv2 in the AMI at /usr/local/bin/aws2, alongside awscliv1 as /usr/bin/aws on Proprietary and OSS Nvidia Driver AMI
Release Date: 2024-03-14
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240314
Updated
- Updated OSS Nvidia driver DLAMI with G4dn and G5 support, based on it current support looks like below:
- Deep Learning Base Proprietary Nvidia Driver AMI (Ubuntu 20.04) supports P3, P3dn, G3, G5, G4dn.
- Deep Learning Base OSS Nvidia Driver AMI (Ubuntu 20.04) supports G5, G4dn, P4, P5.
- OSS Nvidia driver DLAMIs are recommended to be used for G5, G4dn, P4, P5.
Release Date: 2024-02-12
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240208
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240208
Updated
- AWS OFI NCCL plugin is updated from 1.7.3 to 1.7.4
Release Date: 2024-02-01
AMI Names:
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20240201
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240201
Security
- Updated runc package version to consume patch for CVE-2024-21626.
Release Date: 2023-12-04
AMI Names:
Please refer to Important changes to DLAMI
- Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 20.04) 20231204
- Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20231204
Added
- AWS Deep Learning AMI (DLAMI) is split into two separate groups:
- DLAMI that uses Nvidia Proprietary Driver (to support P3, P3dn, G3, G5, G4dn).
- DLAMI that uses Nvidia OSS Driver to enable EFA (to support P4, P5).
- Please refer to public annoucement for more information on DLAMI split.
- AWS cli queries for above are in the release notes under bullet point Query AMI-ID with AWSCLI (example region is us-east-1)
Updated
- EFA updated from 1.26.1 to 1.29.0
- GDRCopy updated from 2.3 to 2.4
Release Date: 2023-10-18
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20231018
Updated
- AWS OFI NCCL Plugin updated from version 1.7.2 to version 1.7.3
- Updated CUDA 12.0-12.1 directories with NCCL version 2.18.5 to match CUDA 12.2
- CUDA12.1 updated as the default CUDA Version
- Updated LD_LIBRARY_PATH to have /usr/local/cuda-12.1/targets/x86_64-linux/lib/:/usr/local/cuda-12.1/lib:/usr/local/cuda-12.1/lib64:/usr/local/cuda-12.1 and PATH to have /usr/local/cuda-12.1/bin/
- For customers looking to change to any different CUDA version, please define the LD_LIBRARY_PATH and PATH variables accordingly.
Release Date: 2023-10-02
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20231002
Updated
- NVIDIA Driver updated from 535.54.03 to 535.104.12
- This latest driver fixes NVML ABI breaking changes found in driver version 535.54.03, as well as the driver regression found in version 535.86.10 that affected CUDA toolkits on P5 instances.
- Updated CUDA 12.2 directories with NCCL 2.18.5
- EFA updated from version 1.24.1 to latest 1.26.1
Added
- Added CUDA12.2 at /usr/local/cuda-12.2
Removed
- Removed support for CUDA 11.5 and CUDA 11.6
Release Date: 2023-09-26
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230926
Added
- Added net.naming-scheme changes to fix unpredictable network interface naming issue (link) seen on P5. This change is made by setting net.naming-scheme=v247 in the linux boot arguments in the file /etc/default/grub
Release Date: 2023-08-30
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230830
Updated
- Updated aws-ofi-nccl plugin from v1.7.1 to v1.7.2
Release Date: 2023-08-11
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230811
Added
- This AMI now provides support for Multi-node training functionality on P5 and all the previously-supported EC2 instances.
- For P5 EC2 instance, NCCL 2.18 is recommended to be used and has been added to CUDA12.0, and CUDA12.1.
Removed
- Removed support for CUDA11.3 and CUDA11.4.
Release Date: 2023-08-04
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230804
Updated
- Updated AWS OFI NCCL plugin to v1.7.1
- Made CUDA11.8 as default as PyTorch 2.0 supports 11.8 and for P5 EC2 instance, it is recommended to use >=CUDA11.8
- Updated LD_LIBRARY_PATH to have /usr/local/cuda-11.8/targets/x86_64-linux/lib/:/usr/local/cuda-11.8/lib:/usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8 and PATH to have /usr/local/cuda-11.8/bin/
- For any different cuda version, please define LD_LIBRARY_PATH accordingly.
- Updated CUDA 12.0, 12.1 directories with NCCL 2.18.3
Fixed
- Fixed Nvidia Fabric Manager (FM) package load issue mentioned in earlier Release Date 2023-07-19.
Release Date: 2023-07-19
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230719
Updated
- EFA updated from 1.22.1 to 1.24.1
- Nvidia driver updated from 525.85.12 to 535.54.03
Added
- Added c-state changes to disable idle state of processor by setting the max c-state to C1. This change is made by setting `intel_idle.max_cstate=1 processor.max_cstate=1` in the linux boot arguments in file /etc/default/grub
- AWS EC2 P5 instance support:
- Added P5 EC2 instance support for workflows using single node/instance. Multi-node support (e.g. for multi-node training) using EFA (Elastic Fabric Adapter) and AWS OFI NCCL plugin will added in an upcoming release.
- Please use CUDA>=11.8 for optimal performance.
- Known Issue: Nvidia Fabric Manager (FM) package takes time to load on P5, customers need to wait for 2-3 mins until FM loads after launching P5 instance. To check if FM is started, please run command sudo systemctl is-active nvidia-fabricmanager , it should return active before starting any workflow. This will be improved in upcoming release.
Release Date: 2023-05-19
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230519
Updated
- EFA updated to latest 1.22.1
- Updated NCCL version for CUDA 12.1 to 2.17.1
Added
- Added CUDA12.1 at /usr/local/cuda-12.1
- Added support for NVIDIA Data Center GPU Monitor (DCGM) through the datacenter-gpu-manager package
- You can check the status of this service through the following query: sudo systemctl status nvidia-dcgm
- Ephemeral NVMe Instance stores are now automatically mounted to supported EC2 instances and storage can be accessed in the folder /opt/dlami/nvme/
- You can check or modify this service in the following ways
- Check the status of NVMe service: sudo systemctl status dlami-nvme
- To access or modify the service: /opt/aws/dlami/bin/nvme_ephemeral_drives.sh
- NVMe’s provided the fastest and most efficient storage solutions for high throughput workflows that require IOPS performance. Ephemeral NVMe instance stores are included with the cost of the instances, so there is no additional cost incurred with this service. For more information on NVMe storage on Amazon EC2, please reference the public documentation.
- NVMe instance stores will only be mounted on EC2 instances that support them. For information on EC2 instances with NVMe supported instance stores, please see the EC2 Instance Type Summary Table and validate that NVMe is available under the Instance Store column.
- To improve disk performance and to reduce first-write penalties, you may initialize the instances stores (note , this process may take hours depending on the EC2 instance type) - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disk-performance.html
- NOTE: NVMe instance stores are mounted on the instance and are not attached to the network like EBS. The data on these NVMe’s may be lost upon reboot or stoppage of your instance. For more information on instance store volumes, please reference the public EC2 documentation.
- You can check or modify this service in the following ways
Release Date: 2023-04-17
AMI Name: Deep Learning Base GPU AMI (Ubuntu 20.04) 20230414
Updated
- Updated DLAMI name from AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) ${YYYY-MM-DD} to Deep Learning Base GPU AMI (Ubuntu 20.04) ${YYYY-MM-DD}
- Please note we will support latest DLAMI with old AMI name for a month from this release for any support needed. Customers are able to update their OS packages apt-get update && apt-get upgrade to consume security patches.
- Updated AWS OFI NCCL plugin path from /usr/local/cuda-xx.x/efa/ to /opt/aws-ofi-nccl/
- Updated NCCL to a custom GIT branch of v2.16.2, co-authored by AWS and NCCL team for all CUDA versions. It performs better on AWS infrastructure.
Added
- Added CUDA12.0 at /usr/local/cuda-12.0
- Added AWS FSx
- Added support for Python version 3.9 in /usr/bin/python3.9
- Note that this change does not replace the default system Python, python3 will still point the system Python3.8.
- Python3.9 can be accessed utilizing the following commands:
- /usr/bin/python3.9
- python3.9
Removed
- Removed CUDA11.0-11.1 from /usr/local/cuda-11.x/ as they are not being used by any supported framework versions based on framework support policy.
Release Date: 2022-05-25
AMI Name: AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20220523
Updated
- This release adds support for new EC2 instance p4de.24xlarge.
- Updated aws-efa-installer to version 1.15.2
- Updated aws-ofi-nccl to version 1.3.0-aws which include the topology for p4de.24xlarge.
Release Date: 2022-03-25
AMI Name: AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20220325
Updated
- Updated EFA version from 1.15.0 to 1.15.1
Release Date: 2022-03-17
AMI Name: AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20220323
Added
- First Release