Posted On: Oct 7, 2021
Amazon SageMaker now supports Fast File Mode for accessing data in training jobs. This enables high performance data access by streaming directly from Amazon S3 with no code changes from the existing File Mode. For example, training a K-Means clustering model on a 100GB dataset took 28 minutes with File Mode but only 5 minutes with Fast File Mode (82% decrease).
Training machine learning models often requires large amounts of data. Efficiently accessing that data helps improve model training performance. Until now, SageMaker offered two modes for reading data directly from Amazon S3: File Mode and Pipe Mode. File Mode downloads training data to an encrypted Amazon EBS volume attached to the training instance. This download needs to finish before model training starts. Pipe Mode streams the data directly to the training algorithm, which can lead to better performance, but requires code changes.
Fast File Mode combines the ease of use of the existing File Mode with the performance of Pipe Mode. This provides convenient access to data as if it was downloaded locally, while offering the performance benefit of streaming the data directly from Amazon S3. As a result, training can start without waiting for the entire dataset to be downloaded to the training instances. Fast File Mode is available to use without additional charges.