AWS HPC Blog

How to use rate-limited resources in AWS Batch jobs with resource aware scheduling

Batch processing usually has some aspect that will limit how big an analysis can get or how fast it can complete. A lot of times, the limiting factor is based on your compute needs, such as the number of concurrent CPU’s or the speed of underlying IO system that is feeding data to your jobs. While AWS provides a lot of ways to overcome these challenges (larger instances, different storage services) there are times when your jobs will be rate-limited by something outside of your control, like the number of licenses that your application has access to or a rate-limited third-party API or database. In these cases, you want to keep the number of concurrent running jobs below what the resource can handle so that jobs don’t idle waiting on the resource, or – even worse – fail due to lack of a resource.

AWS Batch just released resource aware scheduling which allows you to define consumable resources for use in your jobs. Consumable resources represent any limitation that spans across running jobs such as application licenses or concurrent database connections. In order to model those scarce resources, you simply create a consumable resource and how many of that resource you have in total. You then define how many of the consumable resource are needed for each submitted job in the job definition. Batch automatically factors the available consumable resources when it makes scheduling decisions — i.e. only start a job when there are enough of the consumable resource for that job.

Prior to resource aware scheduling, you would have needed to create your own mechanism for restricting how many jobs run at a given time. With the release of this new feature, you can let Batch handle all of the logic for allocating resources and scheduling jobs to make maximal the use of the compute Batch scales.

How it works

Let’s take a closer look at one of our examples — limiting the number of jobs that can run based on the number of licenses available for your application. First, we define a consumable resource called “foobar-licenses” in the AWS Batch management console and let Batch know that we have 10 licenses to use in jobs (Figure 1).

Figure 1 - The AWS Batch management console Create consumable resource form showing the creation of the “foobar-licenses” replenishable consumable resource with a count of 10.

Figure 1 – The AWS Batch management console Create consumable resource form showing the creation of the “foobar-licenses” replenishable consumable resource with a count of 10.

Consumable resource can either be replenishable or non-replenishable. Replenishable resources are added back to the available count once the job completes. Non-replenishable resources will not be added back to the available count when a job completes. Since licenses are reusable, we’ll define this consumable resource as replenishable. I’ll talk more about non-replenishable resources later in this post.

Next, I’ll define a job definition that leverages a single “foobar-licenses” consumable resource (Figure 2). Each example job will run for 5 minutes.

Figure 2 – The AWS Batch management console Create job definition form showing how to set the job to utilize one count of the “foobar-licenses” consumable resource.

Finally, I submitted an array job of 50 to the job queue. The following console screenshots were taken a few minutes after that array job submission. Figure 3 shows the Consumable resource’s details page, showing the counts for resource in use and those still available. The figure also shows the job search form you can use to view the jobs that have a reservation on the consumable resource, even across job queues.

Figure 3 - The Consumable resource details page for "foobar-licenses" showing 10 of 10 ar  in use, and a jobs search box to view the jobs that have a reservation on the consumable resource, even across job queues.

Figure 3 – The Consumable resource details page for “foobar-licenses” showing 10 of 10 are in use, and a jobs search box to view the jobs that have a reservation on the consumable resource, even across job queues.

As confirmation that only jobs with an allocated foobar-license resource ran, you can check the array job’s details page (Figure 4) to see the status of child jobs and verify that only ten jobs run at the same time.

Figure 4 - The array job child job status summary showing that 20 jobs are still pending, 10 are running, and 20 completed.

Figure 4 – The array job child job status summary showing that 20 jobs are still pending, 10 are running, and 20 completed.

Once all of the jobs complete you should see that the available consumable resource count go back to 10.

Non-replenishable consumable resources

Replenishable consumable resources seem pretty handy, but what about non-replenishable resources? As we mentioned, the count of non-replenishable resources does not go back up once a job completes. You can, however, update the total count of the resource to a new value using the AWS Batch console or the UpdateConsumableResource API call. A good use case for using non-replenishable consumable resources would be accessing an external service with a time-based usage model such as “100 calls to our inference service per day”. For this example, you would create a “inferences-per-day” non-replenishable consumable resource with a limit of 100, and then reset the counter at the beginning of each day.

By the way, you can also increase or decrease the number of a replenishable consumable resource. This is handy if you ever need to reallocate resources between AWS Regions that use the same underlying resource, or if you needed to temporarily increase the number of a resource for an urgent workload, then need to scale back down to your normal amount. Decreasing the total number of a consumable resource below what is in use does not affect the running jobs that allocated a resource. When the jobs finish, the released resource will not be added to the available count if that would exceed the current total amount. Additionally, the number of concurrent running jobs will scale back to the total available count as jobs complete.

What about my license server?

AWS Batch consumable resources do not have access or coordinate with outside resources, and they can only meter usage by Batch jobs within the AWS Region they are defined in. This means that you will need to provision a subset of licenses for use by Batch as a consumable resource within your license management framework. If you ever need to reallocate licenses, you can increase or decrease the Regional consumable resource counts as appropriate.

Conclusion

Resource aware scheduling simplifies how you can manage running complex workloads that depend on limited resources beyond just compute power. By taking a more holistic approach to resource management, AWS Batch allows you to handle enterprise-scale workload orchestration efficiently and cost-effectively, streamlining your operations and reduce unnecessary expenses associated with failed jobs and underutilized resources. To try out resource aware scheduling, visit the AWS Batch management console or read the documentation!

TAGS: ,
Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.

Nikita Buldakov

Nikita Buldakov

Nikita Buldakov is a Senior Technical Product Manager for AWS Batch, and works in the Advanced Computing and Simulation org at AWS. His background is in applied mathematics as well as strategic consulting in technology, transportation, and industrial domains. At AWS Batch Nikita focuses on Autonomous Vehicles, Advanced Driver Assistance Systems, and Robotics workloads.