Posted On: Nov 22, 2023

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3. PyTorch is an open source machine learning framework widely used by AWS customers to build and train machine learning models. The Amazon S3 Connector for PyTorch automatically optimizes S3 read and list requests to improve data loading and checkpoint performance for your training workloads. Saving machine learning training model checkpoints is up to 40% faster with the Amazon S3 Connector for PyTorch than saving to Amazon EC2 instance storage.

The Amazon S3 Connector for PyTorch delivers a new implementation of PyTorch's dataset primitive that you can use to load training data from Amazon S3. It supports both map-style datasets for random data access patterns and also iterable-style datasets for sequential data access patterns. The Amazon S3 Connector for PyTorch also includes a checkpointing interface to save and load checkpoints directly to Amazon S3, without first saving to local storage and writing custom code to upload to Amazon S3.

Amazon S3 Connector for PyTorch is an open source project. To get started, visit the GitHub page.