AWS Glue Data Catalog now supports Apache Iceberg automatic table optimization through Amazon VPC

Posted on: Nov 21, 2024

AWS Glue Data Catalog now supports automatic optimization of Apache Iceberg tables that can be only accessed from a specific Amazon Virtual Private Cloud (VPC) environment. You can enable automatic optimization by providing a VPC configuration to optimize storage and improve query performance while keeping your tables secure.

AWS Glue Data Catalog supports compaction, snapshot retention and unreferenced file management that help you reduce metadata overhead, control storage costs and improve query performance. Customers who have governance and security configurations that require an Amazon S3 bucket to reside within a specific VPC can now use it with Glue Catalog. This gives you broader capabilities for automatic management of your Apache Iceberg data, regardless of where it's stored on Amazon S3.

Automatic optimization for Iceberg tables through Amazon VPC is available in 13 AWS regions US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland, London, Frankfurt, Stockholm), Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney), South America (São Paulo). Customers can enable this through the AWS Console, AWS CLI, or AWS SDKs.

To get started, you can now provide the Glue network connection as an additional configuration along with optimization settings such as default retention period and days to keep unreferenced files. The AWS Glue Data Catalog will use the VPC information in the Glue connection to access Amazon S3 buckets and optimize Apache Iceberg tables.
To learn more, read the blog, and visit the AWS Glue Data Catalog documentation.