AWS Batch Documentation

With AWS Batch, you package the code for your batch jobs, specify their dependencies, and submit your batch job using the AWS Management Console, CLIs, or SDKs. AWS Batch allows you to specify execution parameters and job dependencies and facilitates integration with a range of batch computing workflow engines and languages (e.g., Pegasus WMS, Luigi, Nextflow, Metaflow, Apache Airflow, and AWS Step Functions). AWS Batch is designed to provision and scale Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and AWS Fargate compute resources, with an option to use On-Demand or Spot Instances based the requirements of your jobs. AWS Batch provides default job queues and compute environment definitions to help you get started.

AWS Batch on Amazon EKS

AWS Batch can run your batch jobs on your existing Amazon EKS clusters. You specify the vCPU, memory, and GPU requirements your containers need and then submit them to a queue attached to an Amazon EKS cluster–enabled compute environment. AWS Batch is designed to manage both the scaling of Amazon EKS nodes and the placement of pods within nodes. Additionally, AWS Batch is designed to manage queueing, dependency tracking, job retries, prioritization, and pod submission, and to provide support for Amazon Elastic Compute Cloud (EC2) On-Demand and Spot Instances. AWS Batch is also built to integrate with Amazon EKS clusters in a distinct namespace. AWS Batch can manage capacity for you, including maintaining a warm pool of nodes, capping capacity at a certain amount of vCPU, scaling nodes, and running jobs either in a single cluster or across multiple clusters.

Compute resource provisioning and scaling

When using Fargate or Fargate Spot with AWS Batch, you set up a few concepts in AWS Batch (a CE, job queue, and job definition), and you have a queue, scheduler, and compute architecture without managing compute infrastructure.

For those wanting EC2 instances, AWS Batch is designed to provide Managed Compute Environments that provision and scale compute resources based on the volume and resource requirements of submitted jobs. You can configure AWS Batch Managed Compute Environments with requirements such as type of EC2 instances, VPC subnet configurations, the min/max/desired vCPUs across all instances, and the amount you are willing to pay for Spot Instances as a % of the On-Demand Instance price.

Alternatively, you can provision and manage compute resources within AWS Batch Unmanaged Compute Environments if you need to use different configurations (e.g., larger EBS volumes or a different operating system) for EC2 instances than what is provided by AWS Batch Managed Compute Environments. You provision EC2 instances that include the Amazon ECS agent and run supported versions of Linux and Docker. AWS Batch can then run batch jobs on the EC2 instances that you provision.

AWS Batch with Fargate

AWS Batch with Fargate resources allows you to have a serverless architecture for batch jobs. Fargate is designed to provide jobs with the amount of CPU and memory requested (within allowed Fargate SKUs), to help you avoid wasted resource time and the need to wait for EC2 instance launches.

If you’re a current AWS Batch user, Fargate allows for an additional layer of separation from Amazon EC2. Fargate is designed so that when submitting Fargate compatible jobs to AWS Batch, there is no need to maintain two different services for workloads that run on Amazon EC2 and on Fargate.

AWS provides a cloud-native scheduler with a managed queue and the ability to specify priority, retries, dependencies, timeouts, and more. AWS Batch helps you to manage your submission to Fargate and the lifecycle of your jobs.

Fargate also helps to provide security (e.g., SOX, PCI compliance), and isolation between compute resources.

Support for tightly-coupled HPC workloads

AWS Batch is built to support multi-node parallel jobs, which helps you to run single jobs that span EC2 instances. This feature lets you use AWS Batch to run workloads such as large-scale, tightly-coupled High Performance Computing (HPC) applications or distributed GPU model training. AWS Batch is also designed to support Elastic Fabric Adapter, a network interface that is designed to run applications that require high levels of inter-node communication at scale on AWS.

Granular job definitions and simple job dependency modeling

AWS Batch allows you to specify resource requirements, such as vCPU and memory, AWS Identity and Access Management (IAM) roles, volume mount points, container properties, and environment variables, to define how jobs are to be run. AWS Batch is built to execute your jobs as containerized applications running on Amazon ECS. AWS Batch also helps you to define dependencies between different jobs. For example, your batch job can be composed of different stages of processing with differing resource needs. With dependencies, you can create jobs with different resource requirements where each successive job depends on the previous job.

Priority-based job scheduling

AWS Batch is designed to set up multiple queues with different priority levels. AWS Batch jobs are stored in the queues until compute resources are available to execute the job. The AWS Batch scheduler evaluates when, where, and how to run jobs that have been submitted to a queue based on the resource requirements of each job. The scheduler evaluates the priority of each queue and runs jobs in priority order on optimal compute resources (e.g., memory vs CPU optimized), as long as those jobs have no outstanding dependencies.

Support for GPU scheduling

GPU scheduling allows you to specify the number and type of accelerators your jobs require as job definition input variables in AWS Batch. AWS Batch will scale up instances appropriate for your jobs based on the required number of GPUs and isolate the accelerators according to each job’s needs, so only the appropriate containers can access them.

Support for workflow engines

AWS Batch can be integrated with commercial and open-source workflow engines and languages such as Pegasus WMS, Luigi, Nextflow, Metaflow, Apache Airflow, and AWS Step Functions, helping you to use workflow languages to model batch computing pipelines.

Integration with EC2 Launch Templates

AWS Batch supports Amazon EC2 Launch Templates, which help you to build customized templates for compute resources and enables AWS Batch to scale instances with those requirements. You can specify an Amazon EC2 Launch Template to add storage volumes, specify network interfaces, or configure permissions, among other capabilities. Amazon EC2 Launch Templates help you reduce the number of steps required to configure AWS Batch environments by capturing launch parameters within one resource.

Flexible allocation strategies

AWS Batch allows customers to choose how to allocate compute resources. These strategies allow customers to factor in throughput as well as price when deciding how AWS Batch should scale instances on their behalf. 

Integrated monitoring and logging

AWS Batch is designed to display operational metrics for batch jobs in the AWS Management Console. You can view metrics related to compute capacity, as well as running, pending, and completed jobs. AWS Batch is designed to make logs for your jobs (e.g., STDERR and STDOUT) available in the console and write them to Amazon CloudWatch Logs.

Access control

AWS Batch uses IAM to control and monitor the AWS resources that your jobs can access, such as Amazon DynamoDB tables. Through IAM, you can define policies for different users in your organization. For example, admins can be granted full access permissions to AWS Batch API operations, developers can have limited permissions related to configuring compute environments and registering jobs, and end users can be restricted to the permissions needed to submit and delete jobs.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.