Posted On: Jul 7, 2023

AWS Glue Crawlers now supports Apache Iceberg tables, simplifying the adoption of AWS Glue Data Catalog as catalog for Iceberg tables and migrating from other Iceberg catalogs. Apache Iceberg is an open-source table format for data stored in data lakes that helps data engineers manage complex challenges, such as managing continuously evolving data sets while maintaining query performance. With today’s launch, you can automatically register Iceberg tables into Glue Catalog by running the Glue Crawler. You can then query Glue Catalog Iceberg tables across various analytics engines and apply Lake Formation fine-grained permissions when querying from Amazon Athena.

When migrating from other Iceberg Catalogs, you can create and schedule a Glue Crawler and provide one or more Amazon S3 paths where the Iceberg tables are located. You have the option to provide the maximum depth of S3 paths that the Glue Crawler can traverse. With each run, Glue Crawler will extract schema information and update Glue Catalog with the schema changes. Glue Crawler supports schema merging across snapshots and updates the latest metadata file location in the Glue Catalog that AWS analytical engines can directly use.

AWS Glue Crawler’s support for Iceberg tables is available in all commercial regions where AWS Glue is available; see  the AWS Region Table. To learn more, visit the AWS Glue Crawler documentation to learn more.