Amazon EMR on Amazon EKS enables you to submit Apache Spark jobs on demand on Amazon Elastic Kubernetes Service (EKS) without provisioning clusters. With EMR on EKS, you can consolidate analytical workloads with your other Kubernetes-based applications on the same Amazon EKS cluster to improve resource utilization and simplify infrastructure management.
Until now, you had to choose between using EMR to manage Apache Spark on EC2 or self-managing Apache Spark on Amazon EKS. When you use EMR on EC2, the EC2 instances are dedicated to EMR. When you self-manage Apache Spark on EKS, you need to manually install, manage, and optimize Apache Spark to run on Kubernetes.
With Amazon EMR on Amazon EKS, you can share compute and memory resources across all of your applications and use a single set of Kubernetes tools to centrally monitor and manage your infrastructure. You can also use a single EKS cluster to run applications that require different Apache Spark versions and configurations, and take advantage of automated provisioning, scaling, faster runtimes, and development and debugging tools that EMR provides.
You get the same EMR benefits for Apache Spark on EKS that you get on EC2 today. This includes fully managed versions of Apache Spark 2.4 and 3.0, automatic provisioning, scaling, performance optimized runtime, and tools like EMR Studiofor authoring jobs and an Apache Spark UI for debugging.
With EMR on EKS, your compute resources can be shared between your Apache Spark applications and your other Kubernetes applications. Resources are allocated and removed on demand to eliminate over-provisioning or under-utilization of these resources, enabling you to lower costs as you only pay for the resources you use.
By running analytics applications on EKS, you can reuse existing EC2 instances in your shared Kubernetes cluster and avoid the startup time of creating a new cluster of EC2 instances dedicated for analytics. You can also get 3x faster performance running performance optimized Spark with EMR on EKS compared to standard Apache Spark on EKS.
How it works
With a few clicks in the Amazon EMR console, you can choose the Apache Spark version and deploy an EMR workload to Amazon EKS. EMR automatically packages the workload into a container, and provides pre-built connectors for integrating with other AWS services. EMR then deploys the container on the EKS cluster, and manages scaling, logging, and monitoring of that workload.
Centralize resource management
With EMR on EKS, you can automate the provisioning, management, and scaling of Apache Spark, and use a single set of tools to centrally manage and monitor your infrastructure.
Co-location of workloads
Run multiple EMR workloads that require different frameworks, versions, and configurations on the same EKS cluster as your other application workloads.
Rapid adoption of new EMR versions
EMR on EKS provides a managed experience for developing, troubleshooting, and optimizing your analytics. You can deploy configurations and start jobs in seconds to test new EMR versions on the same EKS cluster without allocating dedicated resources.
AWS Online Tech Talk
Run Spark on Kubernetes with Amazon EMR on Amazon EKS.
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions.
These templates include recommended Kubernetes add-ons and best practices for running production-grade EMR on EKS workloads. You can use these templates to minimize the time needed to setup your production stacks or Proof-of-Concepts.