PyTorch on AWS Features

PyTorch is an open-source deep learning framework that makes it easier to develop machine learning (ML) models and deploy them to production. You can use PyTorch on AWS to build, train, and deploy state-of-the-art deep learning models. PyTorch on AWS offers high-performance compute, storage, and networking services; open-source contributions to PyTorch, such as TorchElastic and TorchServe; and optimizations such as the Amazon S3 plugin for PyTorch. You can get started using AWS Deep Learning AMIs (DLAMIs), AWS Deep Learning Containers for containerized applications, or Amazon SageMaker for fully managed infrastructure, tools, and workflows.

Key product features

TorchServe

TorchServe is an open-source tool that makes it easier to deploy trained PyTorch models performantly at scale. TorchServe delivers lightweight serving with low latency, so you can deploy your models for high-performance inference. TorchServe also provides default handlers, such as object detection and text classification, for the most common applications, so you don’t have to write custom code to deploy your models. With powerful TorchServe features such as multimodal serving, model versioning for A/B testing, metrics for monitoring, and RESTful endpoints for application integration, you can quickly take your models from research to production. TorchServe supports any ML environment, including Amazon SageMaker, Kubernetes, Amazon Elastic Kubernetes Service (EKS), and Amazon Elastic Compute Cloud (EC2). To get started with TorchServe, see the documentation and our blog post.

TorchElastic Controller for Kubernetes

TorchElastic is a library for training large-scale deep learning models where it is critical to dynamically scale compute resources based on availability. Elastic and fault-tolerant training with TorchElastic can help you take ML models to production more quickly and adopt state-of-the-art approaches to model exploration as architectures continue to increase in size and complexity.

The TorchElastic Controller for Kubernetes is a native Kubernetes implementation for TorchElastic that automatically manages the lifecycle of the pods and services required for TorchElastic training. It allows you to start training jobs with a portion of the requested compute resources and dynamically scale as more resources become available, without having to stop and restart the jobs. In addition, jobs can recover from nodes that are replaced because of node failures or reclamation.

PyTorch support in the AWS Neuron SDK

The AWS Neuron SDK is integrated with PyTorch, providing developers with a familiar environment to run their machine learning inference on AWS Inferentia based Amazon EC2 Inf1 instances. The AWS Neuron SDK allows PyTorch models to execute on EC2 Inf1 instances and implements data parallelism on the models, which allows dynamic batching and parallelized inference for faster performance.

Amazon S3 plugin

Amazon S3 plugin for PyTorch is an open-source library intended for use with the PyTorch deep learning framework for streaming data from Amazon Simple Storage Service (S3). With this feature available in PyTorch Deep Learning Containers, you can use data from S3 buckets directly with PyTorch APIs without first needing to download data to local storage.

Learn how to get started with PyTorch on AWS

Visit the getting started page.

Learn more

Instantly get access to the AWS Free Tier.

Start building in the console

Get started building in the AWS Management Console.