PyTorch on AWS

Getting Started with PyTorch on AWS

AWS Deep Learning AMIs

The AWS Deep Learning AMIs (DLAMIs) equip machine
learning (ML) practitioners and researchers with the infrastructure and
tools to accelerate deep learning in the cloud at scale. You can quickly
launch Amazon Elastic Compute Cloud (EC2) instances preinstalled with
PyTorch to train sophisticated, custom artificial intelligence (AI)
models to experiment with new algorithms or learn new skills and
techniques.

DLAMIs come preconfigured with the NVIDIA CUDA
interface and NVIDIA CUDA Deep Neural Network library (cuDNN). DLAMIs
also support Habana Gaudi–based Amazon EC2 DL1 instances and AWS
Inferentia powered Amazon EC2 Inf1 instances and AWS Neuron libraries.
To begin building PyTorch models using DLAMIs, review the DLAMI tutorial.

AWS Deep Learning Containers

AWS Deep Learning Containers are Docker images
preinstalled with PyTorch to make it easier to quickly deploy custom ML
environments instead of having to build and optimize your environments
from scratch. Deep Learning Containers provide optimized environments
and are available in the Amazon Elastic Container Registry (ECR).

Amazon SageMaker provides containers for its built-in
algorithms and prebuilt Docker images for PyTorch. If you would like to
extend a prebuilt SageMaker algorithm or model Docker image, you can
modify the SageMaker image. If you would like to adapt a preexisting
PyTorch container image to work with SageMaker, you can modify the
Docker container to use either the SageMaker training or Inference
toolkit.

To get started with PyTorch on AWS Deep Learning Containers, use the following resources:

Deep Learning Containers for Amazon EC2 using PyTorch: Training | Inference
Deep Learning Containers for Amazon Elastic Container Service (ECS) using PyTorch: Training | Inference
Deep Learning Containers for Amazon Elastic Kubernetes Service (EKS) using PyTorch: Training | Distributed Training | Inference
Deep Learning Containers for Amazon SageMaker using PyTorch: Using Docker containers with SageMaker

Amazon SageMaker

You can use Amazon SageMaker to train and deploy a model with custom
PyTorch code. Amazon SageMaker Python SDK with PyTorch estimators and
models and SageMaker open-source PyTorch containers help simplify the
process of writing and running a PyTorch script. SageMaker removes the
heavy lifting from each step of the ML lifecycle to make it easier to
develop high-quality models. Use SageMaker distributed libraries with
PyTorch to perform large-model training more quickly by automatically
splitting deep learning models and training datasets across AWS GPU
instances through data parallelism or model parallelism.

To get started with PyTorch on SageMaker, use the following resources:

Amazon EC2 Inf1 instances and AWS Inferentia

Amazon EC2 Inf1 instances are built from the ground up to support machine learning inference applications. Inf1 instances feature up to 16 AWS Inferentia chips, high performance machine learning inference chips designed and built by AWS. Inf1 instances deliver up to 3x higher throughput and up to 40% lower cost per inference than Amazon EC2 G4 instances, which were already the lowest cost instance for machine learning inference available in the cloud. Using Inf1 instances, you can run large scale machine learning inference with PyTorch models at the lowest cost in the cloud. To get started, see our tutorial on running PyTorch models on Inf1.