AWS Storage Blog

New on the Machine Learning blog: Speed up training on Amazon SageMaker using Amazon FSx for Lustre and Amazon EFS file systems

Deploying analytics applications and machine learning models requires storage that can scale in capacity and performance to handle workload demands with high throughput and low-latency file operations.

A common use case we’re seeing centers around data science teams doing some form of analytics (e.g machine learning, genomics). AWS offers two scalable, durable, highly available file solutions for big data and analytics workloads. Amazon EFS is a cloud-native, shared NFS storage solution for Linux-based applications, as well as ML frameworks and shared notebook systems. Customers like Faculty are leveraging EFS to scale their analytics workloads and are seeing increased agility to delivery insights faster.

Amazon FSx for Lustre is high-performance file system for processing Amazon S3 or on-premises data providing sub-millisecond access to your data and allows you to read and write data at speeds of up to hundreds of gigabytes per second of throughput and millions of IOPS. Amazon FSx for Lustre works natively with Amazon S3, making it easy for you to process cloud data sets with compute-intensive file systems. Conductor Technologies uses FSx Lustre for their cloud rendering platform bringing simplicity and scale as well as lower TCO to their VFX and animation studio customers.

This week, we’re excited about the AWS SageMaker team’s announcement that customers can now speed up machine learning training jobs by accessing data from both EFS and FSx for Lustre to inform decision making and improve their customer experiences.

Check out their blog post to learn more about AWS file storage solutions for machine learning workloads.