Posted On: Aug 2, 2019
Starting today, AWS Batch supports the ability to expose host devices to your AWS Batch jobs, including the Elastic Fabric Adapter (EFA), which enables you to run highly performant distributed HPC and machine-learning workloads by using AWS Batch’s managed instance provisioning and scheduling.
EFA is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications. With EFA, High Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. As a result, you get the application performance of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS cloud.
AWS Batch is a cloud-native scheduler that manages instance provisioning and job scheduling. AWS Batch automatically provisions instances according to job specifications, with the appropriate placement group, networking configurations, and with any user-specified file system. Batch automatically sets up the EFA interconnect to the instances it launches, which the customer specifies through a single API parameter.
To learn more about using EFA and exposing host devices to AWS Batch please visit the documentation.