Pipe Input Mode is now Supported in Amazon SageMaker Algorithms

Posted on: May 24, 2018

You can now run your training jobs with the built-in Amazon SageMaker algorithms up to 35% faster with Pipe input mode. Using Pipe input mode, your training job streams data directly from Amazon Simple Storage Service (Amazon S3) to the algorithm container on the training instances, to provide faster start times for training jobs and better throughput. For example, benchmarks indicated start times improved by up to 10 minutes on an 78GB file, with throughput twice as fast in some benchmarks. 

Most Amazon SageMaker algorithms work best when you use the optimized protobuf recordIO format for training data for speed optimization. Using this format allows you to take advantage of Pipe input mode when training the algorithms that support it. Prior to Pipe input mode, all of your data was loaded from Amazon S3 to the Amazon Elastic Block Store (Amazon EBS) volumes attached to your training instances using File input mode, which required disk space to store both your final model artifacts and your full training dataset. File input mode is still preferred when the algorithm requires multiple epochs and the training dataset is small enough to fit in memory, but Pipe input mode works better with large datasets.

Pipe input mode is available in Amazon SageMaker today in the US East (N. Virginia), U.S. East (Ohio), EU (Ireland) and U.S West (Oregon) AWS regions. Visit the documentation for more information on Pipe Input Mode with select Amazon SageMaker algorithms, and read the blog post for how to use the Pipe Input Mode feature and review benchmarks against File Input Mode.