Containers

Simplify AI infrastructure for AWS Trainium and Elastic Fabric Adapter with Kubernetes Dynamic Resource Allocation

As organizations scale AI workloads in containerized environments, they face the complexity of managing specialized hardware that creates friction between infrastructure teams focused on stability and machine learning (ML) practitioners focused on model performance. Kubernetes Dynamic Resource Allocation (DRA) provides the foundation to solve these problems. We built the Elastic Fabric Adapter (EFA) DRA driver in the upstream DRANET project and the Neuron DRA driver for AWS Trainium to extend these benefits to customers running AI workloads on AWS. Together, these drivers deliver a unified, topology-aware resource management experience for the full stack of AWS AI infrastructure from high-performance Remote Direct Memory Access (RDMA) networking with EFA to accelerator management with AWS Trainium.

Why Dynamic Resource Allocation matters for AI workloads

Kubernetes was originally designed for general-purpose compute. The device plugin model that it introduced for specialized hardware was a first step, but it comes with rigid, count-based allocation that can’t address the topology and co-location requirements of modern AI workloads. A single misconfiguration, placing an accelerator far from its nearest EFA interface, or splitting a training job across the wrong Non-uniform memory access (NUMA) boundaries can hurt performance.

DRA evolves device management in Kubernetes with structured, attribute-rich resource descriptions that the Kubernetes scheduler natively understands. Instead of requesting “four accelerators” and “four EFAs” and hoping workloads are scheduled with close-proximity devices, workloads describe what they need and the scheduler makes informed placement decisions.

DRA solves real problems that AWS customers face every day including infrastructure teams spending weeks tuning custom schedulers and init containers, ML practitioners waiting on infrastructure changes to test new model configurations, and organizations running at lower utilization because sharing specialized hardware safely was too complex.

EFA DRA driver (DRANET): High-performance networking for AI workloads

The EFA DRA driver, built in the upstream DRANET project, brings DRA to high-performance EFA networking in AWS. Together, the EFA and Neuron DRA drivers provide several benefits for distributed AI workloads including topology-aware placement, resource sharing, Kubernetes-native scheduling, and flexible per-workload configuration.

Topology-aware allocation. The EFA DRA driver publishes PCIe and device group topology information so Kubernetes can place EFA interfaces close to their associated AWS Trainium or NVIDIA GPU devices for lower-latency communication.

EFA interface sharing. Multiple workloads can safely share EFA interfaces on the same node to improve utilization.

Built on upstream standards. The EFA DRA driver was developed with the upstream DRANET community and aligns with emerging Kubernetes AI infrastructure standards.

Neuron DRA driver: Accelerator Management for AWS Trainium

The Neuron DRA driver extends these benefits to AWS Trainium accelerator management by handling device allocation and per-workload configuration for Neuron devices.

Kubernetes-native scheduling. The Neuron DRA driver publishes hardware topology and Neuron-EFA locality information directly to Kubernetes for topology-aware scheduling without custom scheduler extensions.

Atomic multi-node allocation. DRA coordinates scheduler and kubelet resource validation before workload startup, eliminating many custom validation scripts and init containers.

Flexible per-workload configuration. Teams can configure settings such as LNC size through ResourceClaimTemplates instead of EC2 Launch Templates, allowing mixed configurations on shared nodes.

Role-based abstraction. Platform teams define reusable infrastructure templates while ML practitioners consume them through simple size-based configurations. device counts or understanding network topologies.

Better together: Unified accelerator and network management

The real power comes from using the EFA DRA and Neuron DRA drivers together. A single ResourceClaimTemplate can request both Neuron devices and EFA interfaces with automatic device group alignment, so that each accelerator communicates through its closest network interface. This removes the coordination overhead of managing separate device plugins for networking and compute, and replaces it with a single, declarative resource request.

For distributed training across multiple nodes, this means the scheduler places workloads with full awareness of both the accelerator topology and the network fabric. For disaggregated inference (separating prefill from decode across nodes), it means inter-node communication is automatically routed through the optimal EFA path. Teams achieve higher throughput without manually validating hardware placement.

For ML platform and operations teams, DRA provides the tools to establish a library of ResourceClaimTemplates that define optimized patterns for common AI workloads. These templates encapsulate the complexity of hardware topology, networking configuration, and resource allocation behind straightforward, size-based selections. ML platform and operations teams maintain full control over infrastructure policies, cost guardrails, and performance optimization without exposing this complexity to end users.

For ML practitioners, deploying sophisticated AI applications becomes straightforward. Instead of calculating device counts or understanding network topologies, developers reference pre-defined templates in their application manifests, selecting configurations like “disaggregated-inference-large” or “pipeline-parallel-128k” that automatically map to the right hardware configurations. This self-service experience maintains the familiar Kubernetes workflow while abstracting away the underlying infrastructure complexity.

How it works

The DRA implementation introduces several key components:

  • ResourceClaimTemplates – These define the policies and configurations for different workload patterns. ML Operations teams create these templates to encapsulate infrastructure complexity, specifying allocation modes, device requirements, topology constraints, and configuration parameters.
  • ResourceSlices – The EFA and Neuron DRA drivers publish ResourceSlices to advertise the inventory of available EFA and Neuron devices on each node to the Kubernetes scheduler. Each ResourceSlice contains detailed device attributes such as IDs, topology information, cross-device topology information including Neuron-EFA co-location, NUMA affinity, and fabric-aware placement metadata, driver versions, and so on.
  • DeviceClasses – The DRA driver publishes device classes that define how to categorize EFA and Neuron resources using device attributes from ResourceSlices. This categorization might be based on the device capabilities, topology, and driver versions. The Kubernetes scheduler uses this information to make intelligent placement decisions.
  • ResourceClaims – When deploying a workload, Kubernetes creates ResourceClaims from the templates. The DRA driver processes these claims, validates topology requirements, and allocates resources atomically before the workload starts.

Getting started

The EFA and Neuron DRA drivers are installed through Helm. These DRA drivers are recommended for new deployments on Amazon Elastic Kubernetes Service (Amazon EKS) clusters running Kubernetes version 1.34 or later with EKS managed node groups or self-managed nodes. The Neuron and EFA device plugins remain supported and are recommended for use with Karpenter and Amazon EKS Auto Mode. You can’t run DRA drivers on the same nodes as corresponding device plugins.

Sample ResourceClaimTemplates are available in the AWS Neuron documentation and the Amazon EKS User Guide.

The following ResourceClaimTemplate requests four EFA interfaces aligned with four Neuron devices using the matchAttribute constraint. The scheduler makes sure that the allocated EFA interfaces and Neuron devices belong to the same connected device topology group:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: aligned-efa-neuron
spec:
  spec:
    devices:
      requests:
      - name: 4-neurons
        exactly:
          deviceClassName: neuron.aws.com
          count: 4
      - name: 4-efas
        exactly:
          deviceClassName: efa.networking.k8s.aws
          count: 4
      constraints:
      - requests: ["4-neurons", "4-efas"]
        matchAttribute: "resource.aws.com/devicegroup4_id"

The matchAttribute field tells the scheduler to allocate devices that share the same topology attribute. The devicegroup4_id attribute identifies a group of four connected Neuron devices, making sure that the four allocated EFA interfaces are topologically local to those specific devices.

A Pod references this template to get topology-aligned resources:

apiVersion: v1
kind: Pod
metadata:
  name: neuron-inference-worker
spec:
  containers:
  - name: worker
    image: my-inference-image
    resources:
      claims:
      - name: neuron-efa
  resourceClaims:
  - name: neuron-efa
    resourceClaimTemplateName: aligned-efa-neuron

The Pod references the template by name. The scheduler handles all topology validation and device placement. No custom scheduler extensions, no init containers, no manual PCIe or device group topology calculations.

Conclusion

Together, the EFA and Neuron DRA drivers represent a fundamental shift in how you can manage AI accelerators and high-performance networking for distributed AI workloads on AWS. The Neuron DRA driver delivers topology-aware accelerator allocation with role-based abstraction, while the EFA DRA driver helps with network interface placement and sharing. By creating clear boundaries between ML platform teams and ML practitioners, teams can accelerate their AI initiatives while maintaining infrastructure governance.

Whether you’re serving large language models (LLMs) with disaggregated inference, processing extreme context lengths with pipeline parallelism, or managing diverse AI workloads more efficiently, the EFA and Neuron DRA drivers provide the capabilities that modern AI operations demand.

To get started, see the following resources.


About the authors

Yahav Biran

Yahav Biran is a Principal  Solutions Architect at AWS, focusing on large-scale AI workloads. He contributes to open-source projects and publishes in AWS blogs and academic journals, including the AWS compute and AI blogs and the Journal of Systems Engineering. He frequently delivers technical presentations and collaborates with customers to design Cloud applications. Yahav holds a Ph.D. in Systems Engineering from Colorado State University.

Maen Suleiman

Maen Suleiman is a Principal Technical Product Manager at AWS, leading the Neuron PM team that defines product strategy for the AWS Neuron Software Stack, the software powering Trainium and Inferentia. Neuron provides a comprehensive platform for ML training and inference with native support for popular frameworks like PyTorch and vLLM. Before AWS, Maen held product and engineering leadership roles at Marvell Semiconductor, leading embedded processor SDK strategy, Linux Kernel upstreaming, and software-defined networking initiatives.