Artificial Intelligence
How to run distributed training using Horovod and MXNet on AWS DL Containers and AWS Deep Learning AMIs
Distributed training of large deep learning models has become an indispensable way of model training for computer vision (CV) and natural language processing (NLP) applications. Open source frameworks such as Horovod provide distributed training support to Apache MXNet, PyTorch, and TensorFlow. Converting your non-distributed Apache MXNet training script to use distributed training with Horovod only […]
