AWS Deep Learning Containers for PyTorch 1.5.0
Release Date: May 06, 2020
Created On: May 06, 2020
Last Updated: May 06, 2020
The AWS Deep Learning Containers are available today with the latest framework versions of PyTorch 1.5.0, with newly added SageMaker Inference, SageMaker PyTorch Inference, and the latest version of SageMaker PyTorch Training. You can launch the new versions of the Deep Learning Containers on Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), self-managed Kubernetes on Amazon EC2, and Amazon Elastic Container Service (Amazon ECS). For a complete list of frameworks and versions supported by the AWS Deep Learning Containers, see the release notes below.
The AWS Deep Learning Containers for PyTorch include containers for training on CPU and GPU, optimized for performance and scale on AWS. These Docker images have been tested with Amazon SageMaker, EC2, ECS, and EKS, and provide stable versions of NVIDIA CUDA, cuDNN, Intel MKL, and other required software components to provide a seamless user experience for deep learning workloads. All software components in these images are scanned for security vulnerabilities and updated or patched in accordance with AWS Security best practices.
More details can be found in the marketplace, and a list of available containers can be found in our documentation. Get started quickly with the AWS Deep Learning Containers using the getting-started guides and beginner to advanced level tutorials in our developer guide. You can also subscribe to our discussion forum to get launch announcements and post your questions.
Release Notes
Security Advisory
- AWS recommends that customers monitor critical security updates in the AWS Security Bulletin
Highlights of the Release
- Updated PyTorch to version 1.5.0
- Introduced NVIDIA Apex
Prepackaged Deep Learning Frameworks Included
- PyTorch: PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration, and Deep Neural Networks built on a tape-based autograd system.
- branch/tag used : v1.5.0
- Supported with CUDA 10.1 and Intel MKL-DNN
- Horovod: Horovod is a distributed training framework. The goal of Horovod is to easily take single-GPU deep learning program and train it on multiple GPUs. Horovod nodes communicate directly with each other instead of going through a centralized node and average gradients using the ring-allreduce algorithm.
- branch/tag used : v0.19.1
- SageMaker Python SDK: SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks such as Apache MXNet, PyTorch, and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker compatible Docker containers, you can train and host models using these as well.
- branch/tag used: v1.50.17
Bill of Materials: List of all components
- Common Packages across all containers:
- torch==1.5.0
- torchvision==0.6.0
- ipython==7.10.1 (training)
- ipython==7.7.0 (inference)
- ipython==5.8.0
- Cython==0.29.12
- typing==3.6.4
- numpy==1.16.4
- pandas==0.25.0
- pillow==7.1.0
- h5py==2.9.0
- requests==2.22.0
- awscli==1.18.51
- sagemaker==1.50.17
- GPU: Training container
- cuda-command-line-tools-10-1
- cuda-cufft-10-1
- cuda-curand-10-1
- cuda-cusolver-10-1
- cuda-cusparse-10-1
- libcudnn7=7.6.3.30-1+cuda10.1
- libnccl2=2.4.8-1+cuda10.1
- llibcublas10=10.2.1.243-1
- horovod==0.19.1
- fastai==1.0.59
- openmpi==4.0.1
- dgl==0.4.3
- scipy==1.2.2
- sagemaker-containers==2.8.6.post0
- sagemaker-experiments==0.1.7
- sagemaker-pytorch-training==1.3.3
- smdebug==0.7.2
- CPU: Training container
- dgl==0.4.3
- scipy==1.2.2
- sagemaker-containers==2.8.6.post0
- sagemaker-experiments==0.1.7
- sagemaker-pytorch-training==1.3.3
- smdebug==0.7.2
- GPU: Inference container
- cuda-command-line-tools-10-1
- cuda-cufft-10-1
- cuda-curand-10-1
- cuda-cusolver-10-1
- cuda-cusparse-10-1
- libcudnn7=7.6.5.32-1+cuda10.1
- libnccl2=2.4.8-1+cuda10.1
- libcublas10=10.2.1.243-1
- mxnet-model-server==1.0.8
- sagemaker-inference==1.2.2
- sagemaker-pytorch-inference==1.4.3.post0
- sagemaker-containers==2.8.6
- CPU: Inference container
- mxnet-model-server==1.0.8
- sagemaker-inference==1.2.2
- sagemaker-pytorch-inference==1.4.3.post0
- sagemaker-containers==2.8.6
Python 3.6 Support
Python 3.6 are supported in the PyTorch Training and Inference containers.
CPU Instance Type Support
The containers support CPU instance types.
GPU Instance Type support
The containers support GPU instance types and contain the following software components for GPU support:
- CUDA 10.1
- cuDNN 7.6.3.30
- NCCL 2.4.8
AWS Regions support
The containers are available in the following regions:
Region | Code |
---|---|
US East (Ohio) |
us-east-2 |
US East (N. Virginia) |
us-east-1 |
US West (Oregon) |
us-west-2 |
US West (N. California) |
us-west-1 |
Asia Pacific (Mumbai) |
ap-south-1 |
Asia Pacific (Seoul) |
ap-northeast-2 |
Asia Pacific (Singapore) |
ap-southeast-1 |
Asia Pacific (Sydney) |
ap-southeast-2 |
Asia Pacific (Tokyo) |
ap-northeast-1 |
Asia Pacific (Hong Kong) |
ap-east-1 |
Central (Canada) |
ca-central-1 |
EU (Frankfurt) |
eu-central-1 |
EU (Ireland) |
eu-west-1 |
EU (London) |
eu-west-2 |
EU( Paris) |
eu-west-3 |
EU (Stokholm) |
eu-north-1 |
SA (Sau Paulo) |
sa-east-1 |
Middle East (Bahrain) |
me-south-1 |
China (Beijing) | cn-north-1 |
China (Ningxia) | cn-northwest-1 |
Build and Test
- Built on: c5.18xlarge
- Tested on: c4.8xlarge, c5.18xlarge, g3.16xlarge, m4.16xlarge, p2.16xlarge, p3.16xlarge, p3dn.24xlarge
- Tested with MNIST and Resnet50/ImageNet datasets on EC2, ECS AMI (Amazon Linux AMI 2.0.20190614), EKS AMI (1.13) and Amazon Sagemaker.
Known Issues
- PyTorch requires the tuning of the OMP_NUM_THREADS parameter to achieve optimal performance