Amazon Redshift Spectrum adds support for querying open source Apache Hudi and Delta Lake

Posted on: Sep 24, 2020

You can now use Amazon Redshift to run read queries against tables in your Amazon S3 data lake with open source Apache Hudi or Delta Lake. Amazon Redshift Spectrum, a feature of Amazon Redshift, enables you to query your S3 data lake directly from your Redshift cluster without first loading the data into it, minimizing time to insight.

Redshift Spectrum powers the lake house architecture which allows you to query your data across Redshift, lake house, and operational databases without any need for ETL or loading data. Redshift Spectrum supports open data formats, such as Parquet, ORC, JSON, and CSV. Redshift Spectrum also supports querying nested data with complex data types such as struct, array, or map.

Redshift Spectrum allows you to read the latest snapshot of Apache Hudi version 0.5.2 Copy-on-Write (CoW) tables and you can read the latest Delta Lake version 0.5.0 tables via the manifest files.

To learn more, see creating external table for Apache Hudi or Delta Lake in the Amazon Redshift Database Developer Guide.