AWS Glue adds new transforms (Purge, Transition and Merge) for Apache Spark applications to work with datasets in Amazon S3

Posted on: Jan 16, 2020

AWS Glue now supports three new transforms - Purge, Transition, Merge - that can help you extend your extract, transform, and load (ETL) logic in Apache Spark applications. You can use the Purge transform to remove files, partitions or tables, and quickly refine your datasets on S3.

You can use the Transition transform to migrate files, partitions or tables to lower S3 storage classes. You can also use AWS Glue S3 Storage Class exclusions to exclude reading files or partitions from specific S3 storage classes in your Glue ETL jobs. You can use the Merge transform to combine multiple Glue dynamic frames representing your data in S3, Redshift, Dynamo, or JDBC sources based on primary keys. To learn more, please visit the Purge, Transition and Merge documentation.

This feature is available in all regions where AWS Glue is available.