Posted On: Jan 6, 2023

Amazon EMR Serverless is a serverless option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. Today, we are excited to announce that EMR Serverless now allows you to customize images for Apache Spark and Hive. This means that you can package application dependencies or custom code in the image, simplifying running Spark and Hive workloads.

Running custom images simplifies many big data analytics use cases. For example, data engineers can customize the default release image to package common dependencies, custom code, specific Java or Python versions, or SSL certificates required by workloads. They can then store these customized images in Amazon Elastic Container Repository (ECR), making it easy to run Spark workloads with custom dependencies. Security engineers can scan these images to comply with organizational standards. Data Scientists can customize runtime images to include proprietary libraries or specific Python packages. Further, EMR Serverless releases can directly be integrated with your organization's Docker build, test and deployment processes, simplifying continuous integration and continuous delivery (CI/CD) of applications.

To learn more about how to customize the EMR runtime for a specific release to include application dependencies, please visit our documentation.

This feature is available in all AWS regions where EMR Serverless is available. To see regional availability of Amazon EMR Serverless, see frequently asked questions.