Why is my AWS Glue ETL job reprocessing data even when job bookmarks are enabled?
Last updated: 2021-02-16
I enabled job bookmarks for my AWS Glue job, but the job is still reprocessing data.
Here are some common reasons why an extract, transform, and load (ETL) job might reprocess data even though job bookmarks are enabled:
- You have multiple concurrent jobs with job bookmarks, and the max concurrency isn't set to 1.
- The job.init() object is missing.
- The job.commit() object is missing.
- The transformation_ctx parameter is missing.
- The table's primary keys aren't in sequential order (JDBC connections only).
- The source data was modified after your last job run.
For more information about each of these issues, see Error: A job is reprocessing data when job bookmarks are enabled.