Posted On: Dec 19, 2022

AWS Glue crawlers now have enhanced support for Linux Foundation Delta Lake tables, increasing operational efficiency to extract meaningful insights from analytics services such as Amazon Athena, Amazon EMR, and AWS Glue. This feature enables analytics services scan Delta Lake tables without requiring the creation of manifest files by Glue crawlers. Newly cataloged data is now quickly made available for analysis using your preferred analytics and machine learning (ML) tools. 

Previously, Glue crawlers supported Delta Lake tables by creating manifest files in Amazon S3 for different analytics services to consume. Glue crawlers needed to generate manifest files on a periodic basis to include newer transactions in the original Delta Lake tables resulting in longer processing times. 

With today’s launch, you can create and schedule a Glue crawler with the option to create native Delta Lake tables, then provide a path to Amazon S3 where the Delta Lake tables are located. With each crawler run, the crawler inspects and catalogs schema information and partition information, such as updates or deletes, to Delta Lake tables in the Glue Data Catalog.

AWS Glue crawler support for native Delta Lake tables is available in all commercial regions where AWS Glue is available, see the AWS Region Table. Enhanced Delta Lake support is available in Athena engine version 3.0 and Glue version 3.0 or later. To learn more, read the blog, and visit the AWS Glue crawler documentation to learn more.