AWS Machine Learning Blog
AWS to offer NVIDIA A100 Tensor Core GPU-based Amazon EC2 instances
Tens of thousands of customers rely on AWS for building machine learning (ML) applications. Customers like Airbnb and Pinterest use AWS to optimize their search recommendations, Lyft and Toyota Research Institute to develop their autonomous vehicle programs, and Capital One and Intuit to build and deploy AI-powered customer assistants.
AWS offers the broadest and deepest portfolio of ML and AI services suitable for every type of customer, ranging from startups to large enterprises, from beginners to expert ML practitioners. Fundamental components of this portfolio are the AWS compute, networking, and storage services that provide powerful and cost-effective infrastructure for ML applications of any scale.
High-performance, low-cost, and highly scalable compute infrastructure for deep learning powered by NVIDIA GPUs
Model training time directly impacts your ability to iterate and improve on the accuracy of your models quickly. AWS leads the industry in providing you access to high-performance and cost-effective Amazon EC2 instances based on NVIDIA® GPUs.
AWS was first in the cloud to offer NVIDIA V100 Tensor Core GPUs via Amazon EC2 P3 instances. AWS also offers the industry’s highest performance model training GPU platform in the cloud via Amazon EC2 P3dn.24xlarge instances. These instances feature eight NVIDIA V100 Tensor Core GPUs with 32 GB of memory each, 96 custom Intel® Xeon® Scalable (Skylake) vCPUs, the industry’s first 100 Gbps per-instance networking bandwidth, and high-performance, low-latency network fabric via Elastic Fabric Adapter (EFA).
These innovations in the underlying infrastructure, coupled with high-performance storage services such as Amazon Simple Storage Service (Amazon S3) and Amazon FSx for Lustre, and optimizations in ML frameworks, help you drastically reduce the time it takes to iterate on your models to increase accuracy and introduce new features. We recently demonstrated the record-setting performance of these NVIDIA GPU instances by training BERT—a model for natural language processing (NLP), across 256 P3dn.24xlarge instances with a total of 2,048 GPUs. By distributing the training job across a large cluster of GPU instances, we cut down the training time from multiple days to just over 60 minutes.
Not all ML models are the same, and different models benefit from different levels of hardware acceleration. Amazon EC2 G4dn instances with up to eight NVIDIA T4 Tensor Core GPUs are the industry’s most cost-effective GPU instances for ML inference and provide optimal performance for training less complex ML models and for graphics-intensive applications.
NVIDIA A100 Tensor Core GPUs coming to Amazon EC2 instances
As AI model complexity continues to rise, the number of model parameters has gone from 26 million with ResNet-50 just a few years ago to 17 billion today. With newer models, AWS customers are continually looking for higher-performance instances to support faster model training. To increase performance and lower cost-to-train for models, AWS is pleased to announce our plans to offer EC2 instances based on the new NVIDIA A100 Tensor Core GPUs. For large-scale distributed training, you can expect EC2 instances based on NVIDIA A100 GPUs to build on the capabilities of EC2 P3dn.24xlarge instances and set new performance benchmarks. For more information about EC2 instances based on NVIDIA A100 GPUs and potentially participate in early access, see here.
About the Author
Geoff Murase is a Senior Product Marketing Manager for AWS EC2 accelerated computing instances, helping customers meet their compute needs by providing access to hardware-based compute accelerators such as Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). In his spare time, he enjoys playing basketball and biking with his family.