Containers
Utilizing NVIDIA Multi-Instance GPU (MIG) in Amazon EC2 P4d Instances on Amazon Elastic Kubernetes Service (EKS)
In November 2020, AWS released the Amazon EC2 P4d instances. The Amazon EC2 P4d instances deliver the highest performance for machine learning (ML) training and high performance computing (HPC) applications in the cloud. This instance comes with the following characteristics:
- Eight NVIDIA A100 Tensor core GPUs
- 96 vCPUs
- 1 TB of RAM
- 400 Gbps Elastic Fabric Adapter (EFA) with support for GPUDirectRDMA
One of the primary benefits of AWS is elasticity. You can elastically scale workloads according to demand where increased utilization of compute triggers additional scale. With P4d instances, you can now reshape compute resources by creating additional slices of NVIDIA GPUs for various workloads called Multi-instance GPU (MIG).
With MIG, you can partition the GPU with dedicated stream multiprocessor isolation based on different memory profiles. With this option, you can dispatch multiple diverse workloads (which do not require the whole memory footprint of a whole GPU) on the same GPU without performance interference.
Scheduling workloads on these slices concurrently with elastically scaling the nodes through Amazon EC2 Auto Scaling allows you to reshape scaled compute. With MIG, EC2 P4d instances can be used for scalable mixed topology workloads. This post walks through an example of running an ML inferencing workload with and without MIG on Amazon Kubernetes Service (EKS).
MIG Profiles
Different MIG profiles exist for each GPU in the P4d instance. Recall that each p4d.24xlarge comes with eight NVIDIA A100s; each A100 is capable of up to 7x 5 GB A100 slices. This means you can have a node with up to 56 accelerators per node. By shepherding requests across all 56 GPU slices, you can run many diverse workloads per node. The following table shows the available profiles per A100 GPU.
As an added feature, you can mix multiple profiles per GPU for further reshaping and scheduling. For the rest of this post, I refer to the MIG profile by the MIG profile ID (third column preceding) for simplicity.
Deployment on EKS
Amazon EC2 P4d instance supports EKS. So, it is possible, through some configuration changes, to deploy a self-managed nodegroup on which to schedule jobs. In the example here, I use Argo Workflows on top of EKS with MIG to show how you can quickly run DAG workflows that use MIG slices in the backend. The configuration changes can be found in the aws-samples/aws-efa-nccl-baseami-pipeline GitHub. This GitHub requires Packer and if you build the components of the packer script and save an Amazon Machine Image (AMI) this is available by default.
Step 1. Start an EKS cluster with the following command:
Step 2. Next, create a managed node group with a p4d node
It is important to note that MIG is disabled by default when launching a P4d instance. A systemd service was created to enable MIG and set up a default partition scheme. The following code is the systemd service created, this systemd unit file starts before the nvidia-fabricmanager service unit starts in the systemd chain.
The environment file /etc/default/mig defines the $MIG_PARTITION that is used in the script /opt/mig/create_mig.sh.
This is set by user-data in our AWS Launch Template (LT). In the launch template, you can iterate over versions to create Launch Templates (LTs) with different MIG partition profiles. In the following example, create seven slices of the 5GB A100 profile.
Step 3. Once the EKS cluster is running and the nodegroup is created with the nodes in Ready state, you can install the NVIDIA MIG-K8s plugin through Helm.
Now, you verify the repo and that the latest version of the nvidia-device-plugin and gpu-feature-discovery plugins are available.
You can set a MIG strategy to “MIXED” which allows you to address each MIG GPU slice. Install the plugins:
Step 4. After a few minutes, the kubectl describe node should report the 56 GPU slices, which, can be used for allocation.
Step 5. Argo Deployment and Testing
With the base cluster in place, you can go ahead and deploy the Argo Workflows and run through a few tests. Argo Workflows is a Kubernetes plugin purpose-built for orchestrating parallel jobs. In this example, I use Idealo/superresolution workload. This is an ML inferencing example which performs GANs image upscaling.
After deploying the Argo Workflow K8s plugin. You can submit the example workflow below. This directed acyclic graph (DAG) generates a loop launching a variable number of ML upscaling jobs and scheduling them on a single MIG slice.
This workflow includes resources that tell the kubernetes scheduler to schedule this onto an instance that can fulfill this request, i.e. the P4d, and allocate a single 5 GB MIG slice to one super-resolution workflow.
Step 6. Submit the job.
The loop expands the number of members in the range. You can see that all of the 56 GPU slices are allocated when using kubectl describe node
as shown in the following code block:
Check the Argo logs for status of the workflow.
With the workflow and job overhead, you can complete all 56 jobs in about 1 minute 27 seconds. Compared to the whole GPU allocation provided by the nvidia-k8s-plugin, which processes the same workflow in about four minutes. This is because the full GPU is allocated and thus blocked from scheduling further jobs until the eight complete regardless of whether the full GPU is utilized or not; highlighting one of the benefits of MIG.
Cleanup
To clean up the deployment, you can use the eksctl to delete the cluster
Conclusion
With the NVIDIA multi-instance GPU (MIG) on P4d instances on Amazon Elastic Kubernetes Service it’s now possible to execute large scale disparate inferencing workloads handling multiple requests from a single endpoint. With MIG on P4d you can have up to 56 individual accelerators per P4d instance improving utilization in a multiuser and/or multi request architecture. Excited to see what our customers come up with MIG on P4d.