AWS Deep Learning Containers v3.1 for PyTorch
The AWS Deep Learning Containers for PyTorch include containers for Training and Inference for CPU and GPU, optimized for performance and scale on AWS.
Release Date: March 20, 2020
Created On: March 20, 2020
Last Updated: March 21, 2020
The AWS Deep Learning Containers for PyTorch include containers for Training for CPU and GPU, optimized for performance and scale on AWS. These Docker images have been tested with Amazon SageMaker, EC2, ECS, and EKS and provide stable versions of NVIDIA CUDA, cuDNN, Intel MKL, and other required software components to provide a seamless user experience for deep learning workloads. All software components in these images are scanned for security vulnerabilities and updated or patched in accordance with AWS Security best practices.
Detailed Release Note Changes
Security Advisory
- AWS recommends that customers monitor critical security updates in the AWS Security Bulletin
Highlights of the Release
- Introduced SageMaker Python SDK version 1.50.17 for Python3 Containers
- Updated smdebug version to 0.7.1 for Python3 Containers
- Updated smexperiments version to 0.1.7 for Python3 Containers
Prepackaged Deep Learning Frameworks Included
- PyTorch: PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration, and Deep Neural Networks built on a tape-based autograd system.
- branch/tag used : v1.4.0
- Justification : Stable and well tested
- Supported with CUDA 10.1 and Intel MKL-DNN
- branch/tag used : v1.4.0
- Horovod: Horovod is a distributed training framework. The goal of Horovod is to easily take single-GPU deep learning program and train it on multiple GPUs. Horovod nodes communicate directly with each other instead of going through a centralized node and average gradients using the ring-allreduce algorithm.
- branch/tag used : v0.16.4
- Justification : Stable and well tested
- branch/tag used : v0.16.4
- SageMaker Python SDK: SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet, PyTorch, and TensorFlow.
- branch/tag used: v1.50.17
- Justification : Stable and well tested
- branch/tag used: v1.50.17
Bill of Materials: List of all components
Note: All underlined packages have been updated in this release.
- Common Packages across all containers:
- torch==1.4.0
- ipython==7.10.1 (training py3)
- ipython==5.8.0 (py2)
- cython==0.29.12
- typing==3.6.4 (py3)
- typing==3.7.4 (py2)
- numpy==1.16.4
- pandas==0.25.0 (py3)
- pandas==0.24.2 (py2)
- pillow==6.2.0
- h5py==2.9.0
- requests==2.22.0
- CPU: Training container
- torchvision==0.5.0+cpu
- sagemaker == 1.50.17 (py3 only)
- sagemaker-containers==2.8.1
- sagemaker-pytorch-training==1.2.4
- dgl==0.4.1 (py3 only)
- scipy==1.2.2
- smdebug==0.7.1 (py3 only)
- sagemaker-experiments==0.1.7 (py3 only)
- awscli==1.18.22
- GPU: Training container
- cuda-command-line-tools-10-1
- cuda-cufft-10-1
- cuda-curand-10-1
- cuda-cusolver-10-1
- cuda-cusparse-10-1
- libcudnn7=7.6.3.30-1+cuda10.1
- libnccl2=2.4.8-1+cuda10.1
- llibcublas10=10.2.1.243-1
- sagemaker == 1.50.17 (py3 only)
- sagemaker-containers==2.8.1
- sagemaker-pytorch-training==1.2.4
- horovod==0.16.4
- fastai==1.0.59 (py3 only)
- openmpi==4.0.1
- dgl==0.4.1 (py3 only)
- scipy==1.2.2
- smdebug==0.7.1 (py3 only)
- sagemaker-experiments==0.1.7 (py3 only)
- awscli==1.18.22
Python 2.7 and Python 3.6 Support
Python 2.7 and Python 3.6 are supported in the PyTorch Training containers.
Python 3.6 is supported in the PyTorch Inference containers.
End of Life Notices
The Python open source community has officially ended support for Python 2 on January 1, 2020. The PyTorch community has also announced that the PyTorch 1.4 release will be the last one supporting Python 2. DLC releases with the next versions of the PyTorch frameworks will not contain the Python 2 containers. Updates to the Python 2 DLC will be provided on previously published DLC versions only if there are security fixes published by the open source community for those versions. Previous releases of the PyTorch DLC that contain Python 2 will continue to be available.
CPU Instance Type Support
The containers supports CPU instance types.
GPU Instance Type support
The containers support GPU instance types and contain the following software components for GPU support.
- CUDA 10.1 / cuDNN 7.6.3.30 / NCCL 2.4.8
AWS Regions support
Available in the following regions:
Region |
Code |
US East (Ohio) |
us-east-2 |
US East (N. Virginia) |
us-east-1 |
US West (Oregon) |
us-west-2 |
US West (N. California) |
us-west-1 |
Asia Pacific (Mumbai) |
ap-south-1 |
Asia Pacific (Seoul) |
ap-northeast-2 |
Asia Pacific (Singapore) |
ap-southeast-1 |
Asia Pacific (Sydney) |
ap-southeast-2 |
Asia Pacific (Tokyo) |
ap-northeast-1 |
Asia Pacific (Hong Kong) |
ap-east-1 |
Central (Canada) |
ca-central-1 |
EU (Frankfurt) |
eu-central-1 |
EU (Ireland) |
eu-west-1 |
EU (London) |
eu-west-2 |
EU( Paris) |
eu-west-3 |
EU (Stokholm) |
eu-north-1 |
SA (Sau Paulo) |
sa-east-1 |
Middle East (Bahrain) |
me-south-1 |
Build and Test
- Built on: c5.18xlarge
- Tested on: c4.8xlarge, c5.18xlarge, g3.16xlarge, m4.16xlarge, p2.16xlarge, p3.16xlarge, p3dn.24xlarge
- Tested with MNIST and Resnet50/ImageNet datasets on EC2, ECS AMI (Amazon Linux AMI 2.0.20190614), EKS AMI (1.13) and Amazon Sagemaker.
Known Issues
- Py2 GPU Training Container Vulnerabilities:
- pycrypto security vulnerability - pycrypto cannot be upgraded beyond v2.6.1 in Python 2 containers
- astropy Security vulnerability - astropy cannot be upgraded beyond v2.0.16 in Python 2 containers
- PyTorch requires the tuning of the OMP_NUM_THREADS parameter to achieve optimal performance