Posted On: Nov 22, 2022

We are excited to announce support for configuring Spark properties within EMR Studio Jupyter Notebook sessions for interactive Spark workloads. Amazon EMR on EKS enables customers to efficiently run open-source big data frameworks such as Apache Spark on Amazon EKS. Amazon EMR on EKS customers setup and use a managed endpoint (available in preview) to run interactive workloads using integrated development environments (IDEs) such as EMR Studio.

Data scientists and engineers use EMR Studio Jupyter notebooks with EMR on EKS to develop, visualize and debug applications written in Python, PySpark, or Scala. With this release, customers can now customize their Spark settings, such as driver and executor CPU/memory, number of executors, and package dependencies, within their notebook session to handle different computational workloads or different amounts of data, using a single managed endpoint.

To learn more about how to apply different Spark settings within a notebook session, please visit our documentation. Configuration support within a session for managed endpoints is supported for Amazon EMR on EKS 6.9 release and above, and is available in all regions where Amazon EMR on EKS is currently available.