Posted On: Dec 9, 2020
Amazon EMR on Amazon EKS provides a new deployment option for Amazon EMR that allows you to run Apache Spark on Amazon Elastic Kubernetes Service (Amazon EKS). If you already use Amazon EMR, you can now run Amazon EMR based applications with other types of applications on the same Amazon EKS cluster to improve resource utilization and simplify infrastructure management across multiple AWS Availability Zones. If you already run big data frameworks on Amazon EKS, you can now use Amazon EMR to automate provisioning and management, and run Apache Spark up to 3x faster. With this deployment option, you can focus on running analytics workloads while Amazon EMR on Amazon EKS builds, configures, and manages containers.
To get started, register your EKS cluster with Amazon EMR. Then define your job including EMR release version, Spark parameters, and application dependencies. Amazon EMR on Amazon EKS will schedule the pods, containers, and resources onto your Amazon EKS cluster. You can configure your job to run on Amazon EC2 instances, or Amazon Fargate if you want a serverless experience. You can create workflows with Amazon Managed Workflows for Apache Airflow or Apache Airflow and analyze output with per job logs stored in Amazon S3 or Amazon CloudWatch.
To submit jobs using notebooks, EMR Studio provides an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark.
Amazon EMR on EKS pricing is calculated based on the vCPU and memory resources used from the time an Amazon EKS pod is scheduled to the time the Amazon EKS Pod is terminated, rounded up to the nearest second with a one minute minimum. Pricing is based on requested vCPU and memory resources for the Task or Pod.
Amazon EMR on Amazon EKS is available in the US West (Oregon), US East (N Virginia), and Europe (Ireland) AWS Regions.