Posted On: Nov 26, 2018
Dynamic Training is an open-source deep learning project that allows you to reduce model training cost and time by leveraging the cloud's elasticity and scale. The first reference implementation of Dynamic Training is based on Apache MXNet, and is open sourced under Dynamic Training with Apache MXNet.
Traditional distributed training requires a fixed set of hosts, that actively participate in the training job throughout the training process. With Dynamic Training, this requirement is relaxed: the number of hosts in the training cluster is allowed to increase and decrease throughout the training process. This means training jobs can now leverage the compute elasticity of the cloud at low cost. With Dynamic Training, you can elastically add or remove EC2 Spot or reserved instances with no loss in accuracy, significantly reducing the training cost. To get started, visit the Dynamic Training with Apache MXNet on AWS github repository.