Posted On: Mar 14, 2024

Starting today, AWS Batch now supports a Batch Job Queue Blocked CloudWatch Event for jobs stuck in RUNNABLE state. Customers can automate actions using EventBridge to be able to take an action on these stuck jobs. Additionally, customers can configure jobStateTimeLimitActions parameter from the CreateJobQueue and UpdateJobQueue APIs to terminate the stuck job unblocking the jobs behind it within the queue. 

The job at the head of the queue can be stuck in the RUNNABLE due to variety of reasons such as misconfiguration, access permissions or capacity issues etc. Previously, customers did not have visibility into why these jobs are stuck. Now, customers will receive a CloudWatch Event per job that is stuck in RUNNABLE along with the reason behind it. The reason is also provided as a part of the statusJob field in the DescribeJobs API. 

Support for this feature is now generally available in AWS Regions where AWS Batch is available. Please visit our blog to learn more about the reasons covered with these alerts and how to set it up.