AWS Big Data Blog

Peter Slawski

Author: Peter Slawski

Improve Apache Spark write performance on Apache Parquet formats with the EMRFS S3-optimized committer

November 2024: This post was reviewed and updated for accuracy. The EMRFS S3-optimized committer is a new output committer available for use with Apache Spark jobs as of Amazon EMR 5.19.0. This committer improves performance when writing Apache Parquet files to Amazon S3 using the EMR File System (EMRFS). In this post, we run a performance […]