Posted On: Nov 28, 2022

We’re pleased to announce the launch of AWS Glue version 4.0, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue 4.0 upgrades the Spark engines to Apache Spark 3.3.0 and Python 3.10. Glue 4.0 gives customers the latest Spark and Python releases so they can develop, run, and scale their data integration workloads and get insights faster.

AWS Glue is a serverless, scalable data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources. AWS Glue 4.0 adds support for built-in Pandas APIs as well as support for Apache Hudi, Apache Iceberg, and Delta Lake formats, giving you more options for analyzing and storing your data. It upgrades connectors for native AWS Glue database sources such as RDS, MySQL, and SQLServer, which simplifies connections to common database sources. AWS Glue 4.0 also adds native support for the new Cloud Shuffle Storage Plugin for Apache Spark, which helps customers scale their disk usage during runtime. It enables Adaptive Query Execution which dynamically optimizes your queries as it runs. Finally, AWS Glue 4.0 improves the developer experience by adding more context to error messages. As with AWS Glue 3.0, customers only pay for the resources they use.

AWS Glue 4.0 is generally available today in all AWS Regions where AWS Glue is available, except the China Regions and the AWS GovCloud (US) Regions.

To learn more, visit our documentation.