AWS HPC Blog

Explore costs of AWS Batch jobs run on Amazon EKS using pod labels and Kubecost

Last October, AWS Batch introduced support for running batch workloads on your Amazon Elastic Kubernetes Service (Amazon EKS) clusters, and today we are excited about releasing support for adding pod labels within your EKS job definitions. One of the uses for pod labels is to being able to track which job pods ran on which nodes in order to track the cost of these jobs.

Since AWS Batch workloads on Amazon EKS usually leverage a pre-existing cluster that may be shared with other workloads, determining the resources used by AWS Batch pods can be a challenge. This is where Kubecost shines, as it provides Amazon EKS customers with the ability to accurately track Kubernetes resource level costs by namespace, cluster, pod, or organizational concepts (e.g., by team or application) and that includes pods running AWS Batch jobs. Our collaboration with Kubecost provides customers a way to install Kubecost via a Helm chart, or as an EKS add-on as part of their AWS Marketplace offering.

By default, AWS Batch launches pods with the following labels:

  • amazonaws.com/compute-environment-uuid – The UUID of the Batch compute environment.
  • amazonaws.com/job-id – The UUID of the Batch job corresponding to the running pod.
  • amazonaws.com/node-uid – The UID of the node pods are placed on.

While these are already useful for use with Kubecost, the addition of pod labels to jobs opens up many more possibilities, such as tagging jobs by workload, project, or a group within your organization. You can define custom pod labels that conform the Kubernetes pod label specification using the metadata section of the eksProperties / podProperties request when registering job definitions, or the eksPropertiesOverride / podProperties when submitting jobs. Here is an example JSON defining a few labels we will use in Kubecost.

"podProperties": {
  "metadata": {
    "labels": {
      "batchQueue": "batch-eks-JQ",
      "batchComputeEnv": "batch-eks-CE",
      "batchUser":"maxime",
      "jobName":"monte-carlo-run",
      "project":"simulations"
    }, 
  }
  # other podProperties objects …
}

Let’s take a look at some examples of the types of cost reporting you can get from Kubecost.

Example 1: Show me the total cost of my cluster by namespace

You can use Kubecost to quickly see an overview of costs across all of your namespaces. Since AWS Batch requires its own namespace in your Amazon EKS clusters, you can view the total costs of your AWS Batch jobs easily. In this example, we called our AWS Batch EKS namespace batch-eks-namespace.

Figure 1 shows the Kubecost Monitor -> Allocations view, which by default aggragates cost by Namespace.

Figure 1: The Kubecost Allocations report. In order to navigate here, choose Monitor from the menu on the left side (1), then choose the Aggregate by button (2), Single aggregation (3), Namespace (4). The current cost for AWS Batch namespace is highlighted.

Figure 1: The Kubecost Allocations report. In order to navigate here, choose Monitor from the menu on the left side (1), then choose the Aggregate by button (2), Single aggregation (3), Namespace (4). The current cost for AWS Batch namespace is highlighted.

Example 2: Show me the costs for a given AWS Batch EKS compute environment

AWS Batch allows you to attach multiple compute environments to a single Amazon EKS cluster. For example, if you separate GPU resources from pure CPU jobs. We can take advantage of the default pod label for compute environments to separate the costs per Batch compute environment.

To view costs by individual AWS Batch compute environment on EKS:

  1. Select Aggregate by
  2. In Label Name, enter batch.amazonaws.com/compute-environment-uuid
Figure 2 Cost per AWS Batch compute environment based on EKS. In this example, the total cost of the CE is $0.06

Figure 2 Cost per AWS Batch compute environment based on EKS. In this example, the total cost of the CE is $0.06

Compute environment ARNs are not able to be used as pod labels, since ARNs can be longer than the 63 character limit for pod label names and values. Since it’s not straight forward to map a compute environment UUID to the ARN, you can label pods with a short name for the compute environment. Our earlier pod label example used the batchComputeEnv label name, and we can use this to aggragate cost for the compute environments in the cluster (Figure 3).

Figure 3: Cost per AWS Batch compute environment using the batchcomputeEnv pod label.

Figure 3: Cost per AWS Batch compute environment using the batchcomputeEnv pod label.

Example 3: Use a pod label for showing cost per project

You can use a pod label to label pods with a project, a department, or group within the organization, or different types of workloads. In our example, we labeled pods with a project and batchUser. Figure 4 shows the cost allocations using both of these labels in a Multi-aggregation.

Figure 4: Cost allocation view using two pod labels, “batchUser” and “project”, in a multi-aggregation condition.

Figure 4: Cost allocation view using two pod labels, “batchUser” and “project”, in a multi-aggregation condition.

Conclusion

With the support of Amazon EKS, AWS Batch enables you to leverage the Kubernetes eco-system to run and monitor your AWS Batch jobs. Kubecost is a great example that enables customers to monitor and track the costs of using AWS Batch on EKS in more details.

If you want to get started using Kubecost to monitor the cost of AWS Batch jobs, follow the Kubecost deployment documentation to install it on an existing Amazon EKS cluster. And let us know how the results work for you. You can find us at ask-hpc@amazon.com.

Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.

Maxime Hugues

Maxime Hugues

Dr. Maxime Hugues is a Principal HPC Solution Architect at AWS, which he joined in 2020. He holds a M.E. from the French National Engineer School “ISEN-Toulon”, a M.S. degree from the University of Science and a Ph.D. degree in Computer Science in 2011 from the University of Lille 1. His researches were mainly focused on programming paradigms and innovative hardware for Extreme computers. Prior joining AWS, he worked as HPC Research Scientist and as a HPC Tech lead at TOTAL E&P Research & Technology.