New speed record set for training deep learning models on AWS

fast.ai, a research lab dedicated to making deep learning more accessible, has announced that they successfully trained the ResNet-50 deep learning model on a million images in 18 minutes using 16 Amazon EC2 P3.16xlarge instances. They accomplished this milestone by spending just $40. This new speed record illustrates how you can drastically cut down the training times for deep learning models, enabling you to bring your innovations to market faster and at a lower cost.

Amazon EC2 P3 instances offer one of the most powerful GPU-accelerated compute infrastructure available for training AI/ML models in the public cloud. A single Amazon EC2 P3 instance with eight NVIDIA V100 GPUs can train the ResNet-50 deep learning model with ImageNet in about three hours. You can further reduce training times by employing optimization techniques to distribute the training workload across multiple Amazon EC2 P3 stances in the cloud, as demonstrated in our recent blog. Employing similar distributed training techniques and optimizations, fast.ai has trained ImageNet on 16 Amazon EC2 P3.16xlarge instances in 18 minutes.

Amazon EC2 P3 instances are globally available in 14 Regions, enabling our customers to train where their data resides. They’re available as Reserved Instances as well as On-Demand and Spot Instances, which have a discount of up to 90% from on-demand costs and support all major AI/ML frameworks including TensorFlow, MXNet, PyTorch, and Caffe2. You can use a portfolio of Amazon Machine Learning offerings, including Amazon SageMaker and AWS Deep Learning AMIs, storage, and networking services, to build and train models and then deploy them into production.

To learn more about the latest training results, go to the fast.ai blog post.

About the Author

Geoff Murase is a Senior Product Marketing Manager for AWS EC2 accelerated computing instances, helping customers meet their compute needs by providing access to hardware-based compute accelerators such as Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). In his spare time, he enjoys playing basketball and biking with his family.

AWS Machine Learning Blog

New speed record set for training deep learning models on AWS

About the Author

Resources

Blog Topics

Follow