Posted On: Jan 4, 2022

Amazon EMR on EKS supports Custom Images - a functionality that helps customers customize the Docker container image used for running Apache Spark applications on EMR on EKS. Today, EMR on EKS open-sourced a Custom Image Validation Tool that allows customers to run an automated suite of tests to validate their customized docker container image.

Using Custom Images, data engineers and data scientists can install and configure packages specific to their workload that are not available in the default distribution of EMR’s Spark runtimes into a single immutable container. With custom image support, you can create a self-contained docker image with the application and its dependencies for each use-case. For example, you can create a custom image for data engineers that includes a specific Java version and certificates required by the application, and a separate custom image for data scientists that includes different dependencies such as proprietary libraries or specific Python dependencies. Data engineers and data scientists can then use their application specific custom image in EMR on EKS jobs.

You can download the custom image validation tool from our GitHub repository. For setup instructions and usage examples, please visit our Getting Started guide. The custom image validation tool supports currently available Amazon EMR on EKS releases. To contribute to source code, please refer to Contribution Guide and Development Guide. To learn more about customizing images in EMR on EKS, please visit our documentation and blogpost.