Posted On: Jun 6, 2018
With the addition of Horovod, machine learning developers can further boost training performance of the optimized build of TensorFlow 1.8 available in the AMIs by training from a single GPU to multiple GPUs on Amazon EC2 P3 instances.
Horovod uses the Message Passing Interface (MPI) model, a popular standard for passing messages and managing communication between nodes in high-performance distributed computing environments. Compared to the standard TensorFlow distributed training model, Horovod’s MPI implementation provides a more simplified programming model that enable developers to easily scale their existing single-GPU training programs with minimal code changes.
Horovod also uses the NVIDIA Collective Communications Library (NCCL) for optimized implementations of multi-GPU and multi-node communication primitives such as all-reduce to achieve faster performance on P3 instances.
In our tests of Horovod, trained a ResNet-50 model with the ImageNet dataset using our optimized build of TensorFlow 1.8 and OpenMPI 1.10.7 on a single p3.16xlarge P3 instance 1.2x faster than using the standard TensorFlow distributed training model.
The latest AWS Deep Learning AMIs are now available on the AWS Marketplace. You can get started with the AMIs by using our getting started tutorial or visit our developer guide for more tutorials, resources, and release notes. You can also subscribe to our discussion forum to get new launch announcements and post your questions.