AWS Glue crawlers now support existing Data Catalog tables as sources

Posted on: May 10, 2019

You can now specify a list of tables from your AWS Glue Data Catalog as sources in the crawler configuration. Previously, crawlers were only able to take data paths as sources, scan your data, and create new tables in the AWS Glue Data Catalog.

With this release, crawlers can now take existing tables as sources, detect changes to their schema and update the table definitions, and register new partitions as new data becomes available. This is useful if you want to import existing table definitions from an external Apache Hive Metastore into the AWS Glue Data Catalog and use crawlers to keep these tables up-to-date as your data changes. You can also use this feature if you are creating new table definitions using AWS Glue APIs or Apache Hive DDL statements and want to use crawlers to update the tables going forward.  

This feature is available in all the AWS regions where AWS Glue is available. To learn more about this feature, please visit our documentation.