Posted On: May 7, 2021
Today, we are launching Amazon EMR on Amazon EKS support for Pod Templates to make it simple to run Spark jobs on shared EKS clusters. A Pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. Pod Templates are specifications which determine how each Pod runs. Customers often consolidate multiple applications on a shared EKS cluster to improve utilization and save costs. However, each application may have different requirements. For example, you may want to run performance intensive workloads such as ML model training jobs on SSD-backed instances for better performance, or ad-hoc workloads on Spot instances for lower cost. You can also schedule a separate logging container to forward logs to your existing monitoring application. With this release, you can use Pod Templates with EMR on EKS to configure how to run Spark jobs on shared EKS clusters.
To reduce costs, customers can schedule Spark driver pods to run on EC2 On-Demand instances while scheduling Spark executor pods to run on EC2 Spot instances. Kubernetes customers frequently use taints, tolerations and labels to ensure that pods are scheduled onto the right worker nodes. A taint is a property of a worker node that enables it to restrict what pods can run on it. Conversely, a toleration enables a pod to be scheduled over a matching taint. The label is used with nodeSelectors to direct the pod to the worker. Now, Pod Templates can be used to apply tolerations to the Spark driver pod so it will run on a EC2 On-Demand instance, and a separate toleration for the Spark executor pod so they will only run on EC2 Spot instances.
To forward logs to your centralized logging application, customers can deploy a sidecar container with their Spark job. A sidecar container is deployed in the same pod as the application container but provides additional functionality. In this case, it forwards job logs. EMR on EKS provides built in log forwarding to Amazon CloudWatch and Amazon S3. However, if a customer wants to forward logs to their own log reporting application, they would deploy a log forwarder as a daemonset. Daemonsets are run directly on the kubernetes worker nodes. Now, Pod Templates can be used to deploy log forwarding as a sidecar container on a per job or per pod basis.
To increase resource utilization, customers can support multiple teams running their workloads on the same EKS cluster. Frequently, each team will get a designated EC2 node group to run their workloads on. Previously, workloads could only be directed to the right nodegroup using labels and affinity. Customers can apply taints to a team’s nodegroup and now use Pod Templates to apply a corresponding toleration to their workload. This ensures that only the designated team can schedule jobs to their nodegroup.
To implement team based nodegroup, start with creating a nodegroup that includes a label and a taint representing the team. A taint is a property of a worker node that enables it to restrict what pods can run on it. Conversely, a toleration enables a pod to be scheduled over a matching taint. The label, using affinity, directs the application to the team’s designated nodegroup and a toleration enables it to schedule over the taint. Create a Pod Template that includes the corresponding toleration and affinity and store it in an S3 bucket that your job can access. Pod Templates can be created for the Spark driver and executor pods to provide different deployment options and you specify the location of the templates during job submission.