AWS Machine Learning Blog

Faster training with optimized TensorFlow 1.6 on Amazon EC2 C5 and P3 instances

The AWS Deep Learning AMIs come with latest pip packages of popular deep learning frameworks pre-installed in separate virtual environments so that developers can quickly get started with training deep learning models. The new version of the Deep Learning AMIs for Ubuntu and Amazon Linux now come with TensorFlow 1.6, built with advanced optimizations for high-performance training across Amazon EC2 instance families.

Faster training on Amazon EC2 C5 instances

The AMIs come with TensorFlow 1.6 built with Intel’s Advanced Vector Instructions (AVX, AVX-2 and AVX-512) to speed up the performance of vector and floating point operations on Intel Xeon Platinum processors powering Amazon EC2 C5 instances. The AMIs also come fully configured with Intel’s Math Kernel Library for Deep Neural Networks (Intel MKL-DNN) for CPU-acceleration of math routines used in training deep neural networks. Training a ResNet-50 benchmark with synthetic ImageNet dataset using our optimized build of TensorFlow 1.6 on a c5.18xlarge instance type was 7.4X faster than training on the stock TensorFlow 1.6 binaries. Below are the throughput comparisons for few of the popular deep learning benchmarks:

All tests conducted had a batch size of 32.

These compute optimizations also benefit training performance on other EC2 compute-optimized instances including C3 and C4 instance families with similar performance speed-ups compared to stock TensorFlow binaries.

Optimized for faster training on Amazon EC2 P3 instances

The Deep Learning AMIs also come with an optimized build of TensorFlow 1.6 fully configured with NVIDIA CUDA 9 and cuDNN 7 to take advantage of mixed precision training on Volta V100 GPUs powering Amazon EC2 P3 instances. The AMIs come with latest in CUDA and GPU drivers:

  • CUDA 9.0
  • cuDNN 7.0.5
  • NCCL 2.1.2
  • NVIDIA GPU Driver 384.111

Our optimized TensorFlow 1.6 binaries built with Intel’s AVX, SSE and FMA instruction sets will also benefit Amazon EC2 P3 instance deep learning workloads that perform significant data pre-processing on CPU.

Seamless deployment of optimized TensorFlow binaries

The Deep Learning AMIs automatically deploy the high performance build of TensorFlow optimized for the EC2 instance of your choice when you activate the TensorFlow virtual environment for the first time using the following commands:

For Python 2

source activate tensorflow_p27

For Python 3

source activate tensorflow_p36

Getting started with the Deep Learning AMIs

It’s fast and simple to get started with the AWS Deep Learning AMIs. Our latest AMIs are now available on AWS Marketplace. We’ve also provided many quick tutorials and developer resources to help you accelerate model training.


About the Author

Sumit Thakur is a Senior Product Manager for AWS Deep Learning. He works on products that make it easy for customers to get started with deep learning on cloud, with a specific focus on making it easy to use engines on Deep Learning AMI. In his spare time, he likes connecting with nature and watching sci-fi TV series.