Posted On: Nov 28, 2023

You can now accelerate data processing and analysis with Apache Spark applications by up to 4.0x than data in S3 Standard using Amazon EMR and the Amazon S3 Express One Zone storage class. S3 Express One Zone is a high-performance, single-Availability Zone storage class purpose-built to deliver consistent, single-digit millisecond data access for your most frequently accessed data and latency-sensitive applications.

Amazon EMR is the industry-leading cloud big data solution for data processing, interactive analytics, and machine learning on open source frameworks optimized for petabyte-scale on AWS. If you have performance critical workloads with service-level agreements (SLA), such as for job completion time requirements for data lake updates, or need fast response time for BI dashboard reports, use S3 Express One Zone when you run EMR Spark applications on EC2 cluster. 

S3 Express One Zone is available with Amazon EMR release 6.15.0 in the AWS Regions where S3 Express One Zone is available. To get started, move your data to S3 Express One Zone storage and use the S3a connector in your Spark code to read and write data. S3a is the connector used by EMR to process S3 objects and is required with S3 Express One Zone buckets. To learn more, see Using EMR with data in S3 Express One Zone in the Amazon EMR documentation.