Deep Learning on GPU Instances

Train deep learning models with the fastest GPU instances in the cloud

You can use Amazon SageMaker to easily train deep learning models on Amazon EC2 P3 instances, the fastest GPU instances in the cloud. With up to 8 NVIDIA V100 Tensor Core GPUs and up to 100 Gbps networking bandwidth per instance, you can iterate faster and run more experiments by reducing training times from days to minutes.

It’s easy to get started with deep learning on GPU instances using Amazon SageMaker. Try this 10-minute tutorial »

Key features of using Amazon EC2 P3 instances with Amazon SageMaker:

  • Train models quickly to iterate fast, test new hypotheses, and accelerate time to market.
  • Distribute training across hundreds of GPUs with a single click for high efficiency and low cost. 
  • Reduce inference cost by 75% with Amazon Elastic Inference.
  • Minimize the heavy-lifting with fully managed training and hosting.
 
Overview of Amazon SageMaker (1:03)

Amazon EC2 P3 instances powered by NVIDIA V100 Tensor Core GPUs

From recognizing speech to training virtual assistants and autonomous cars, data scientists are taking on increasingly complex challenges with AI. Solving these kinds of problems requires deep learning models that take a long time to train. With 640 Tensor Cores, Tesla V100 GPUs that power Amazon EC2 P3 instances break the 100 teraFLOPS (TFLOPS) barrier for deep learning performance.

The next generation of NVIDIA NVLink™ connects the V100 GPUs in a multi-GPU P3 instance at up to 300 GB/s to create the world’s most powerful instance. AI models that used to take weeks on previous systems can now be trained in a few days. With this reduction in training time, you can solve a whole new world of problems using AI.

Custom ML environments

If you need to set up your own machine learning environments and workflows for domain-specific performance optimization and integration with custom applications, AWS Deep Learning AMIs (AWS DL AMIs) provide pre-packaged, optimized Amazon Machine Images (AMIs). AWS Deep Learning Containers (AWS DL Containers) make it easy to deploy these custom environments on containers by letting you skip the complicated process of building and optimizing your environments from scratch.

AWS DL AMIs and AWS DL Containers are best suited for the following use cases:

  • Making framework and infrastructure optimizations specific to your domain.
  • Integrating with custom or in-house tools.
  • Building new machine learning frameworks, libraries, and interfaces.

Get started with these 10-minute tutorials: AWS DL AMI | AWS DL Containers

Amazon SageMaker and AWS DL AMIs support TensorFlow, PyTorch, Apache MXNet, Chainer, Scikit-learn, SparkML, Horovod, Keras, and Gluon. AWS DL Containers support TensorFlow and Apache MXNet, with PyTorch coming soon.

Start building in the console
Start building in the console

Get started with GPU instances on Amazon SageMaker

Console