Run Interactive Workloads on Amazon EMR Serverless with Spark Connect
Amazon EMR Serverless now supports interactive sessions with Spark Connect, enabling you to develop and run Apache Spark applications from managed notebooks in Amazon SageMaker Unified Studio, as well as your favorite notebook environments and IDEs such as Jupyter and Visual Studio Code. You can also monitor and debug active and completed sessions in the EMR console, and get granular cost and usage visibility for individual sessions.
An interactive session provides a persistent Spark context that seamlessly spans across cells and scripts, enabling you to blend local Python code execution with remote Spark operations within a unified environment. This is enabled by Spark Connect's client-server architecture, which decouples your application client from the Spark driver and allows you to maintain your preferred development environment and tooling while Spark infrastructure runs independently on EMR Serverless. This architecture unlocks workflows including ad hoc data exploration, iterative step-by-step debugging, and incremental PySpark job development before deploying to production. For observability, you get real-time session monitoring via the Spark UI, history tracking through the Spark History Server, and session management from the EMR console or API/CLI/SDK.
Spark Connect on Amazon EMR Serverless is available with EMR release 7.13 in all AWS Regions where Amazon EMR Serverless is available. The SageMaker Unified Studio experience is available in supported regions. To get started, visit the EMR Serverless Interactive Sessions User Guide or the Amazon SageMaker Unified Studio Getting Started guide.