AWS Glue now provides the ability to bookmark Parquet and ORC files using Glue ETL jobs

Posted on: Jul 26, 2019

Starting today, you can maintain job bookmarks for Parquet and ORC formats in Glue ETL jobs (using Glue Version 1.0). AWS Glue tracks data that has been processed during a previous run of an ETL job by storing state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data.  

Previously, you were only able to bookmark common S3 source formats such as JSON, CSV, Apache Avro and XML.  

This feature is available in all regions where AWS Glue is available except AWS GovCloud (US-East) and AWS GovCloud (US-West).  

To learn more about this feature, please visit our documentation