Posted On: Sep 26, 2023

Amazon EMR Serverless is a serverless option that helps data analysts and engineers to run open-source big data analytics frameworks such as Apache Spark and Apache Hive without configuring, managing, and scaling clusters or servers. We are happy to announce that starting today, you can set default configurations at the application level, allowing you to maintain consistent settings for all Spark and Hive jobs submitted under the same application.

This new feature allows you to define default settings for all jobs within an application to help standardize job behavior. These settings - including memory, executor/driver cores, S3 location for storing logs, retrieving secrets from AWS Secrets Manager, and more - are automatically applied to all jobs created under the application while still providing flexibility to customize configurations for specific job runs. For example, you can specify credentials for external Hive metastore databases along with the secrets once in the application configuration, and these default configurations will be inherited by any job runs under that application. This centralized approach makes configurations more predictable and jobs more reproducible.

This feature is available for EMR release versions 6.6.0 and above in regions where Amazon EMR Serverless is available. To learn more, visit the documentation.