AWS Deep Learning Containers for TensorFlow 2.6


Created On: September 24, 2021
Last Updated: June 09, 2022


AWS Deep Learning Containers for TensorFlow 2.6

The AWS Deep Learning Containers are available today with TensorFlow 2.6.0 support for Training and Inference. You can launch the new versions of the Deep Learning Containers on Amazon SageMaker, Amazon Elastic Kubernetes Service (EKS), self-managed Kubernetes on Amazon Elastic Compute Cloud (EC2), and Amazon Elastic Container Service (ECS). For a complete list of frameworks and versions supported by the AWS Deep Learning Containers, see the release notes below.

The AWS Deep Learning Containers for TensorFlow include containers for Training for CPU and GPU, optimized for performance and scale on AWS. These Docker images have been tested with Amazon SageMaker, EC2, ECS, and EKS and provide stable versions of NVIDIA CUDA, cuDNN, Intel MKL, Horovod, and other required software components to provide a seamless user experience for deep learning workloads. All software components in these images are scanned for security vulnerabilities and updated or patched in accordance with AWS Security best practices.

More details can be found in the AWS marketplace, and a list of available containers can be found in our documentation. Get started quickly with the AWS Deep Learning Containers using the getting-started guides and beginner to advanced level tutorials in our developer guide. You can also subscribe to our discussion forum to get launch announcements and post your questions.

Release Notes

Security Advisory

  • TensorFlow has identified security issues in boosted trees and has deprecated it since TensorFlow 2.8. All related code has been removed in TensorFlow 2.9. In AWS DLC, we keep boosted trees for TensorFlow 2.6 to not break users who are currently depending on this feature. Users are recommended to switch to TensorFlow Decision Forests. Users should be aware of this security risk when using boosted trees.
  • AWS recommends that customers monitor critical security updates in the AWS Security Bulletin.

Highlights of the Release

For latest updates, please refer to the aws/deep-learning-containers GitHub repo.

Prepackaged Deep Learning Frameworks Included

  • TensorFlow: TensorFlow is an open source software library for numerical computation using data flow graphs.
    • branch/tag used : v2.6.0
    • Supported with CUDA 11.2 on GPU, and oneDNN on CPU
  • Horovod: Horovod is a distributed training framework. The goal of Horovod is to enable distributed training for large models by efficiently using multiple GPUs.. Horovod nodes communicate directly with each other instead of going through a centralized node and average gradients using the ring-allreduce algorithm.
  • SageMaker Python SDK: The SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms, optimized for SageMaker and GPU training. You can also train and deploy your own algorithms built into SageMaker compatible Docker containers.
  • SageMaker Distributed Data Parallel: Amazon SageMaker Distributed Data Parallel (SDP) extends SageMaker’s training capabilities on deep learning models with near-linear scaling efficiency, achieving fast time-to-train with minimal code changes. The SDP package is licensed under the AWS Customer Agreement.
  • SageMaker Distributed Model Parallel: Amazon SageMaker Distributed Model Parallel (SMP) is a model parallelism library for training large deep learning models that were previously difficult to train due to GPU memory limitations. The SMP package is licensed under the AWS Customer Agreement.

Python Support

Python 3.8 is supported in the containers for the installed deep learning frameworks.

CPU Instance Type Support

The containers support CPU instance types. TensorFlow is built with support for oneDNN library support.

GPU Instance Type support

The containers supports GPU instance types and contain the following software components for GPU support.

  • Training: CUDA 11.2 / cuDNN 8.1.0.77-1+cuda11.2 / NCCL 2.8.4+cuda11.2
  • Inference: CUDA 11.2 / cuDNN 8.1.0.77-1+cuda11.2 / NCCL 2.8.4-1+cuda11.2

AWS Regions support

Region Code
US East (Ohio) us-east-2
US East (N. Virginia) us-east-1
US West (Oregon) us-west-2
US West (N. California) us-west-1
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Osaka) ap-northeast-3
Asia Pacific (Seoul) ap-northeast-2
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Asia Pacific (Tokyo) ap-northeast-1
Central (Canada) ca-central-1
EU (Frankfurt) eu-central-1
EU (Ireland) eu-west-1
EU (London) eu-west-2
EU(Paris) eu-west-3
SA (Sau Paulo) sa-east-1
EU (Stockholm) eu-north-1
AP East (Hong Kong) ap-east-1
ME South (Bahrain) me-south-1
AF South (Cape Town) af-south-1
EU South (Milan) eu-south-1
China (Beijing) cn-north-1
China (Ningxia) cn-northwest-1

Build and Test

  • Built on: c5.18xlarge
  • DLC images tested on: c4.8xlarge, c5.18xlarge, m4.16xlarge, p3.16xlarge, p3dn.24xlarge, p4d.24xlarge, g4dn.xlarge
  • SageMaker Distributed Data Parallel and Model Parallel features tested on: ml.p3.16xlarge
  • Tested with MNIST and Resnet50/ImageNet datasets on EC2, ECS AMI (Amazon Linux AMI 2.0.20190614) and EKS AMI (1.11-v20190614) and Amazon Sagemaker

Known Issues

  • Some security scanning tools may incorrectly flag the OpenSSL version installed on the DLCs due to the version of openssl being detected as below 1.1.1l. This happens because the distribution of OpenSSL installed has been patched by Canonical, rather than having been installed from source at v1.1.1l. Please see https://github.com/jeremylong/DependencyCheck/issues/3656 for more details.
  • There exists a known reduction in performance on TensorFlow 2.6 when training MaskRCNN models. Please see https://github.com/aws/amazon-sagemaker-examples/issues/2947 for more details.
  • For the TensorFlow framework with Keras, SageMaker Debugger deprecates the zero code change support for debugging models built using the tf.keras modules of TensorFlow 2.6 and later. This is due to breaking changes announced in the TensorFlow 2.6.0 release notes. SageMaker Debugger continues to support the zero code change experience for the native TensorFlow (which excludes the tf.keras modules).