NVIDIA GPU-Optimized AMI
NVIDIA | 24.10.1Linux/Unix, Ubuntu Ubuntu 22.04 - 64-bit Amazon Machine Image (AMI)
Install Drivers
You need to first change the username from root to ubuntu in order to have the drivers be installed! I feel this should have been more specified in the directions!
- Leave a Comment |
- Mark review as helpful
AMI is not configured as advertised.
None of the advertised utilities are installed in the AMI, neither is CUDA. This is current as of 3/14/24. It appears to be a raw installation of 22.04, by my estimation.
root@ip-172-31-38-109:~/cuda-samples/Samples/5_Domain_Specific/nbody# jupyterlab --version
jupyterlab: command not found
root@ip-172-31-38-109:~/cuda-samples/Samples/5_Domain_Specific/nbody# miniconda --version
miniconda: command not found
There's been a lot of troubleshooting so far with regard to attempting to get cuda installed, so I won't copy-paste my terminal.
Drives auto installed on login not boot
I wanted to use this AMI in my automation to run ML jobs in our platform. What I needed was a Ubuntu 22.04, because podman is in the repo, and Nvidia drivers installed. The downside of this AMI is, Nvidia drivers are installed via /home/ubuntu/.bashrc and not cloud-init. I looked at /var/tmp/nvidia/driver.sh and there was no variable to set to force driver install at cloud-init. Since my automation runs at the end of cloud-init this doesn't work.
Very good
Older reviews are not valid anymore, now at the date of my review the image is very good, it has all the drivers required to run optimized code on various types of NVIDIA GPUs, it has CUDA 12.1 preinstalled and also miniconda and Jupyterlab.
The machine is ready to run code on GPU very easily with everything you need already in place.
Missing drivers
This should be preconfigured to run NVIDIA GPU Cloud (NGC) containers such as the PyTorch one, however it fails on launch on AWS (on a p3.2xlarge instance).
After sshing in, I see this error message:
```
Installing drivers ...
modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.2.0-1011-aws
```
And sure enough, running containers such as PyTorch (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) does not work:
```
~$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.11-py3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
```
All I need
The AMI has everything I need to run my deep learning tasks. I don't need to configure any low level stuff.
doesn't include what it claims
Claims to include nvidia-cuda-toolkit, but it doesn't. nvcc --version an error message. The other versions are out of date and incompatible.
Outdated and Useless
This AMI is outdated and doesn't have proper packages. This is what Nvidia suggests on their website https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html
but are unable to provide the same on their AMIs
Great AMI for GPU-Optimized AI Software
Provided me everything I needed to run NVIDIA Riva Speech AI services on my EC2 instance.
The latest update of the AMI came with the NGC Catalog CLI pre-installed. This made it super easy to work with NVIDIA's GPU-Optimized AI software on AWS right from launch.
Great AMI.