AWS Glue now supports the ability to run ETL jobs on Apache Spark 2.4.3 (with Python 3)

Posted on: Jul 25, 2019

AWS Glue has updated its Apache Spark infrastructure to support Apache Spark 2.4.3 (in addition to Apache Spark 2.2.1) for ETL jobs, enabling you to take advantage of stability fixes and new features available in this version of Apache Spark.  

You can pick the Apache Spark infrastructure that you want your Glue jobs to run on by choosing a Glue version in job properties. Your existing Glue ETL jobs that were created without specifying a Glue version will be defaulted to a Glue version of 0.9. Glue jobs with a Glue version of 1.0 will run on Apache Spark 2.4.3. In addition to supporting the latest version of Spark, you will also have the ability to choose between Python 2 and Python 3 for your ETL jobs. 

To learn more about how you can take advantage of this feature, please visit our documentation and release notes.  

This feature is now available in all the AWS regions where AWS Glue is available except AWS GovCloud (US-East) and AWS GovCloud (US-West).