Posted On: Oct 10, 2018

Amazon SageMaker now includes an improved Pipe Mode implementation that accelerates the speed at which data can be streamed from Amazon Simple Storage Service (S3) into SageMaker, while training machine learning (ML) models. The latest implementation of Pipe Mode provides up to 9 times better data streaming throughput compared to File Mode.

Amazon SageMaker supports two methods of transferring training data: File Mode and Pipe Mode. With File Mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance before training the model. With Pipe Mode, the data is streamed directly to the training algorithm while it is running. This results in faster training jobs and lesser disk space, reducing overall costs to train ML models on SageMaker.

Depending on your requirements and your environment, you can choose the suitable mode for your use case. As an example, if your training dataset is small enough to fit in memory and if you need to run multiple epochs, it might be easier to use File Mode and load it all into memory. If you have an I/O bound algorithm, using Pipe Mode will result in an increased throughput as well as a reduction in the size of the required disk volume.

The latest implementation of Pipe Mode is supported in all AWS regions where Amazon SageMaker is available. Visit the Amazon SageMaker documentation for more details.