Posted On: Jul 17, 2018
Amazon SageMaker now supports Pipe Input Mode for the built-in TensorFlow containers. Pipe Input Mode enables data to stream directly from Amazon Simple Storage Service (Amazon S3) to the TensorFlow container on the training instance, using the TensorFlow dataset construct.
This capability provides faster start times for training jobs, better throughput, and lower disk space usage, thus lowering model training costs even further on Amazon SageMaker. As an example, in our internal benchmarks conducted earlier this year when we launched Pipe Input Mode for Amazon Sagemaker’s built-in algorithms, start times reduced by up to 87% on 78GB training dataset, with throughput twice as fast in some benchmarks, resulting in up to 35% reduction in total training time.
Prior to Pipe Input Mode, data was loaded from Amazon S3 to the Amazon Elastic Block Store (Amazon EBS) volumes that are attached to training instances using File Input Mode, which required disk space to store both model artifacts and the full training dataset. File Input Mode can still be useful for training jobs running multiple epochs with datasets that completely fit into memory. Both input modes together cover a spectrum of use cases, from small experimental training jobs to petabyte-scale distributed training jobs.
Pipe Input Mode for TensorFlow containers in Amazon SageMaker is now available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Seoul), and Asia Pacific (Sydney) AWS Regions. Visit the Amazon SageMaker documentation for more details.