Amazon FSx for Lustre now supports Elastic Fabric Adapter and NVIDIA GPUDirect Storage
Amazon FSx for Lustre, a service that provides high-performance, cost-effective, and scalable file storage for compute workloads, now supports Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect Storage (GDS). With this launch, Amazon FSx for Lustre now provides the fastest storage performance for GPU instances in the cloud, delivering up to 12x higher throughput per client instance (1200 Gbps) compared to previous FSx for Lustre systems, so you can complete machine learning training jobs faster and reduce workload costs.
EFA improves workload performance by using the AWS Scalable Reliable Datagram (SRD) protocol to increase network throughput utilization and by bypassing the operating system during data transfer. For applications powered by high-performance computing instances such as Trn1 and Hpc7a, you can use EFA to achieve higher throughput per client instance. GDS support builds on EFA to further enhance performance by enabling direct data transfer between the file system and the GPU memory. This direct path eliminates memory copies and CPU involvement in data transfer operations. With the combination of EFA and GDS support, applications using P5 GPU instances and NVIDIA Compute Unified Device Architecture (CUDA) can achieve up to 12x higher throughput (up to 1200 Gbps) per client instance.
EFA and GDS support is available at no additional cost on new FSx for Lustre Persistent-2 file systems in all commercial AWS Regions where Persistent-2 file systems are available. For more information about this new feature, see the Amazon FSx for Lustre documentation and the AWS News Blog, Amazon FSx for Lustre increases throughput to GPU instances by up to 12x.