Amazon SageMaker Now Supports Pipe Mode for Datasets in CSV Format

Posted on: Nov 5, 2018

The built-in algorithms that come with Amazon SageMaker now support Pipe Mode for datasets in CSV format. This accelerates the speed at which data can be streamed from Amazon Simple Storage Service (S3) into SageMaker by up to 40%, while training machine learning (ML) models. With this new enhancement, the performance benefits of Pipe Mode are extended to training datasets in CSV format in addition to the protobuf recordIO format that we released earlier this year.

Amazon SageMaker supports two methods of transferring training data: File Mode and Pipe Mode. With File Mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance before training the model. With Pipe Mode, the data is streamed directly to the training algorithm while it is running. This results in faster training jobs and lesser disk space, reducing overall costs to train ML models on Amazon SageMaker.

Support for CSV format with Pipe Mode is available in all AWS regions where Amazon SageMaker is available today. You can read additional details in this blog post.