Introducing the Instance Topology API for ML and HPC workloads

Posted on: Nov 14, 2023

AWS announces the general availability of the Amazon Elastic Compute Cloud (EC2) Instance Topology API for Machine Learning and High Performance Computing workloads. The Instance Topology API provides customers a unique per account hierarchical view of the relative proximity between instances. Customers can describe their instance topology to identify instances that are in a tightly coupled group, and can use it to further improve communication time, reducing job completion time.

Customers running distributed parallel workloads like the training of large language models and computational fluid dynamics are scaling their workloads to thousands of EC2 instances. With the EC2 Instance Topology API, customers can describe topology as a network node set and filter by availability zone, group name, instance type, and instance ID. The network node set represents the top down relation of instances to one another within a region. Customers can ingest this topology into their scheduler of choice and use it to allocate instances to jobs on a best fit basis.

The EC2 Instance Topology API is now available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Seoul), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). It is available on the following instance platforms: HPC6id, HPC6a, HPC7a, HPC7g, P3dn, P4d, P4de, P5, TRN1, TRN1n.

To learn more, please visit the latest EC2 User Guide here.

Introducing the Instance Topology API for ML and HPC workloads

Learn

Resources

Developers

Help