How to use Capacity Blocks for ML with AWS Batch

Capacity Blocks for ML (CBML) are a powerful feature that allows you to reserve highly sought-after GPU based EC2 instances for a future date to support your short-duration machine learning (ML) workloads. Since the reservations are “for a future date” you must have a mechanism to launch the instances that you have paid for and place jobs onto them at that specific time. This is where AWS Batch comes in. With an always-on queue ready to accept jobs, and the ability to scale your capacity block reservation at the correct time, AWS Batch provides you with everything you need to maximize your CBML reservations.

Let’s see how this works in practice.

How AWS Batch compute environments leverage Capacity Blocks for ML

AWS Batch compute environments are able to leverage an Amazon EC2 launch template to specify instance parameters such as the AMI ID, security groups, and – most relevant to this conversation – capacity block reservation details.

There are two items that you will need to define in the launch template data: (1) set the InstanceMarketOptions.MarketType to "capacity-block"; and (2) provide the capacity reservation ID. The following code block gives an example:

"InstanceMarketOptions": {
    "MarketType": "capacity-block"
},
"CapacityReservationSpecification": {
    "CapacityReservationTarget": {
        "CapacityReservationId": "cr-02168da1478b509e0"
    }
}

If your goal is to leverage a set of tightly coupled GPU instances for large model training or refinement, you’ll want to set other instance parameters as well, including the instance type of the reservation, an AMI ID, security groups, and network device settings for Elastic Fabric Adapter (EFA). We’ve provided a full example as an AWS Samples repository on GitHub, aws-samples/aws-batch-capacity-block-reservations.

Keep in mind that if you deploy this example into your own account, you will incur the purchase cost of the capacity reservation! Make sure you understand the costs involved prior to deploying the example in your own account.

The high-level steps to integrate a CBML with AWS Batch are straightforward:

Find and purchase the Capacity Block for ML that meets your requirements in terms of instance type, duration, and quantity.
Create an EC2 launch template as described with definitions for at least the capacity block reservations, and other optional details for instance type, security groups, EFA networking, etc.
Create an AWS Batch compute environment (CE) that uses the CBML launch template. There are a couple of other requirements for the CE, such as instance type and Availability Zone to match the CBML reservation. The full list of considerations is discussed in the next section “Considerations for compute environments”.
Create or update an AWS Batch job queue and associate it with the CE you created in the previous step.

Once the job queue is set up, you can submit jobs before the Capacity Block is active. We provided an example job definition in the sample repository that runs a NVIDIA Collective Communication Library (NCCL) bandwidth test across multiple GPUs on the same instance. Customers have shared that they enjoy leveraging Batch to handle the scaling of the CBML and scheduling the jobs as it allows them to have their researchers prepare and maximize the utilization of their reservation.

Considerations for compute environments

There are a few important considerations to keep in mind when using Capacity Blocks for ML with AWS Batch.

The CBML’s Availability Zone and instance type need to copied over to the launch template that your Batch compute environment uses.
The compute environment will transition to an invalid status once the CBML expires. The status reason message will include: “Your Capacity Block reservation ends at, and the expiration process has started.”
The compute environment needs to use the BEST_FIT allocation strategy. This allocation strategy does not allow for updating key compute environment settings, such as the launch template ID and instance types.

For these reasons, you should consider a CBML compute environment as single-use, meaning that once the capacity block reservation expires, you will need to disable and delete the compute environment. To leverage another capacity reservation, you can create a new compute environment and attach it to the job queue in place of the one with the expired reservation.

How scaling works with Capacity Blocks for ML and AWS Batch

One final detail to cover is how scaling works in practice for the compute environment, specifically how the computeResources.minvCpus setting is handled before, during, and after a capacity reservation.

If minvCpus is zero: As jobs are submitted, Batch will continuously attempt to allocate the host, but EC2 instance(s) will not launch until scheduled CBML time-window. Once the time window arrives, instances will start launching and jobs will be placed on those instances. Once there are no more jobs in the queue, Batch will terminate the instances. If more jobs come into the queue during the reservation time-window, Batch will relaunch them until that time window has passed.

If minvCpus is greater than zero: Batch will continuously attempt to allocate the host, irrespective of the number of jobs in the job queue. As before, the EC2 Instance(s) will not launch until scheduled CBML reservation time-window starts.

AWS Batch multi-node parallel (MNP) jobs, which can span one or more EC2 instances, don’t share the same scaling group as regular single-instance jobs. For a new compute environment, Batch assumes that regular single-instance jobs will be placed on it. Practically, this means that any instance that was started due to minvCpus or a basic job will not be reused for an MNP job. Once an MNP job comes to the head of the queue, Batch will try to launch new instances for it. Since large GPU instances can take some time to scale, we recommend the following setting for compute environments with CBML:

Always set the maxvCpus setting to be greater than or equal to the total number of vCPU for your CBML.
If you have a good reason to set the minvCpus to greater than zero, for example unusually long instance launch time or container pulls, you should go ahead and set it equal to the aggregate number of vCPUs for your capacity reservation.
The one caveat to the above is if you expect to launch MNP jobs from the queue. In this case, Batch may first try to launch instances for regular jobs, then immediately cycle them with instances requested by the MNP job. This may cause unnecessary churn, and if you find that is the case, then set minvCpus to zero and let Batch handle the scaling of instances once the CBML time window is active to handle a mix of regular and MNP jobs.

What about On-Demand Capacity Reservations?

Besides Capacity Blocks for ML, there is another type of capacity reservation called an On-Demand Capacity Reservation (ODCR). ODCRs are recommended when you have mission and time critical workloads and want to guarantee the capacity. They work with non-GPU instances. Within the context of AWS Batch, you should account for a couple of differences between an ODCR and a CBML.

First, ODCRs can belong to a capacity reservation group, and the launch template for your Batch compute environment can reference a capacity reservation group ARN instead of a specific capacity reservation ID. Since you can add and remove ODCRs to a capacity reservation group, you do not have to create a new compute environment for each ODCR. This is in contrast to CBML reservations, which cannot be used in a capacity reservation group, and new compute environments will need to be created for each CBML reservation.

Second, instances for an ODCR are pulled from an entire Availability Zone, and are not guaranteed to be closely co-located. If you are launching tightly coupled jobs that need low network latency and high network throughput, you should create the capacity reservations in a cluster placement group.

Finally, Batch does not support ODCRs in AWS Outposts or Local Zones.

What about Spot for GPUs?

So far we have assumed that the use case for the GPU workload are multi-node parallel jobs where the nodes need to communicate with each other. Batch MNP does not support Spot allocation strategies, but there are plenty of workloads that can fit on a single p5.48xlarge instance equipped with 8 Nvidia H100 cards.

In this case, you may want to consider using P5 Spot Instances, which as of today (May 13, 2025) have a low frequency of interruption and 75% savings over the On-Demand rate according to Spot Instance Advisor tool.

Figure 1 – A screen capture of the Spot Instance Advisor tool showing a low frequency of interruption for P5 instances and a savings of up to 75% from On-Demand rates.

Jobs that leverage single instance P5s do not need to define EFA devices in either the launch template nor the job definition, but they do need to define a resource requirement for the GPUs in the job definition. In ECS this would look like the following:

"resourceRequirement": {
  {
    "type": "GPU",
    "value": "8"
  }, 
  {
    "type": "CPU",
    "value": "192"
  }, 
  {
    "type": "MEMORY",
    "value": "2000000"
  }     
}

For EKS it would look like:

"resources": {
  "limits": {
      "cpu": "192",
      "memory": "2000Mi",
      "nvidia.com/gpu": "8"
  }
}

Note: you may have to adjust the value for reserved MEMORY for the job based on the actual free system memory after allocating memory for the ECS Agent or Kubernetes daemon sets and other system processes.

Conclusion

Using AWS Batch to manage scaling your Capacity Blocks for ML and On-Demand Capacity Reservations allows you to focus on developing your workflows and analyses instead of low-level infrastructure management. By following the steps described, you can unlock the power of Capacity Blocks to provide predictable capacity for your AWS Batch workloads, enabling your critical ML and other batch processing tasks are completed efficiently and cost-effectively. Try it out yourself using the example repository and let us know what you think by sending a message to ask-hpc@amazon.com

Select your cookie preferences

AWS HPC Blog

How to use Capacity Blocks for ML with AWS Batch

How AWS Batch compute environments leverage Capacity Blocks for ML

Considerations for compute environments

How scaling works with Capacity Blocks for ML and AWS Batch

What about On-Demand Capacity Reservations?

What about Spot for GPUs?

Conclusion

Resources

Follow

Learn

Resources

Developers

Help