Elastic Fabric Adapter

Run HPC and ML applications at scale

Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications. With EFA, High Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. As a result, you get the application performance of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS cloud.

EFA is available as an optional EC2 networking feature that you can enable on any supported EC2 instance at no additional cost. Plus, it works with the most commonly used interfaces, APIs, and libraries for inter-node communications, so you can migrate your HPC applications to AWS with little or no modifications.

Benefits

Faster results

EFA’s unique OS bypass networking mechanism provides a low-latency, low-jitter channel for inter-instance communications. This enables your tightly-coupled HPC or distributed machine learning applications to scale to thousands of cores, making your applications run faster.

Flexible configuration

You can enable EFA support on a growing list of EC2 instances and get the flexibility to choose the right compute configuration for your workload. Simply change your cluster configurations as your needs change and enable EFA support on your new compute instances. No prior reservations or upfront planning is needed.

Seamless migration

EFA uses libfabric interface and libfabric APIs for communications. Because almost all HPC programming models support this interface, you can migrate your existing HPC applications to the cloud with little to no modifications.

EFA Performance

EFA performance

EFA provides a 4X improvement in scaling over ENA for a standard CFD simulation as shown in the chart above.

Solver for this benchmarking provided by Metacomp Technologies

Site-Merch_reInvent_Elastic-Fabric-Adapter_Editorial
AWS Customer CFD Direct maintains the popular OpenFOAM platform for Computational Fluid Dynamics and also produces CFD Direct From the Cloud (CFDDFC), an AWS Marketplace offering that makes it easy for you to run OpenFOAM on AWS. They have been testing and benchmarking EFA and recently shared their measurements in a blog post titled OpenFOAM HPC with AWS EFA. In the post, they report on a simulations of the external aerodynamics around a car. This simulation scales extra-linearly to over 200 cores, gradually declining to linear scaling at 1000 cores (about 100K simulation cells per core).
 

How it works

Product-Page-Diagram_Elastic-Fabric-Adapter_How-it-Works_updated

Use cases

Computational Fluid Dynamics

Advances in Computational Fluid Dynamics (CFD) algorithms enable engineers to simulate increasingly complex flow phenomena, and HPC helps reduce turn-around times. With EFA, design engineers can now scale out their simulation jobs to experiment with more tunable parameters, leading to faster, more accurate results.

Weather modeling

Complex weather models require high memory bandwidth, fast interconnects, and robust parallel file systems to deliver accurate results. The closer the grid spacing on the model, the more accurate the results—and the more computational resources the model requires. EFA offers a fast interconnect that allows weather modelling applications to take advantage of the virtually unlimited scaling capabilities of the AWS cloud and get more accurate predictions in less time.

Machine Learning

The training of deep learning models can be significantly accelerated with distributed computing on GPUs. Leading deep learning frameworks such as Caffe,Caffe2, Chainer, MxNet, TensorFlow, and PyTorch have already integrated NCCL to take advantage of its multi-GPU collectives for across nodes communications. EFA is optimized for NCCL on AWS, improving the throughput and scalability of these training models, which leads to faster results.

Resources

SiteMerch-HPC_Editorial
Now Available – Elastic Fabric Adapter (EFA) for Tightly-Coupled HPC Workloads
April 29th, 2019
 
AWS re:Invent 2018: Scaling HPC Applications on EC2 w/ Elastic Fabric Adapter
In this reInvent 2018 talk, we introduce Elastic Fabric Adapter and discuss how EFA enhances the inter-instance networking within Amazon EC2
Deep Dive on OpenMPI and Elastic Fabric Adapter (EFA)
In this tech talk, we'll do a deep dive into OpenMPI and its specific support for Amazon EC2's EFA, and show you how to get the most out of your code, and architect your solution for performance.
Site-Merch_reInvent_Elastic-Fabric-Adapter_2up

Getting started with Elastic Fabric Adapter (EFA)

In this tutorial, you create an EFA-enabled AMI and an EFA-enabled security group, and then launch EFA-enabled instances into a cluster placement group using that AMI and security group.
 
Product-Page_Standard-Icons_01_Product-Features_SqInk
Learn more about AWS services for HPC

Learn about all the AWS services you can use to build an HPC solution on AWS

Learn more 
Product-Page_Standard-Icons_02_Sign-Up_SqInk
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Product-Page_Standard-Icons_03_Start-Building_SqInk
Get started with HPC on AWS

Build your first HPC cluster on AWS

Sign in