Amazon EMR Serverless eliminates local storage provisioning for Apache Spark workloads

Posted on: Dec 2, 2025

Amazon EMR Serverless now offers serverless storage that eliminates local storage provisioning for Apache Spark workloads, reducing data processing costs by up to 20% and preventing job failures from disk capacity constraints. You no longer need to configure local disk type and size for each application. EMR Serverless automatically handles intermediate data operation such as shuffle with no local storage charges. You pay only for compute and memory resources your job consumes.

EMR Serverless offloads intermediate data operations to a fully managed, auto-scaling serverless storage that encrypts data in transit and at rest with job-level isolation. Serverless storage decouples storage from compute, allowing Spark to release workers immediately when idle rather than keeping workers active to preserve temporary data. It eliminates job failures from insufficient disk capacity and reduces costs by avoiding idle worker charges. This is particularly valuable for jobs using dynamic resource allocation, such as recommendation engines processing millions of customer interactions, where initial stages process large datasets with high parallelism then narrow as data aggregates.

This feature is generally available for EMR release 7.12 and later. See Supported AWS Regions for availability. To get started, visit serverless storage for EMR Serverless documentation