Announcing Amazon Redshift data lake export: share data in Apache Parquet format

Posted on: Dec 3, 2019

You can now unload the result of an Amazon Redshift query to your Amazon S3 data lake as Apache Parquet, an efficient open columnar storage format for analytics. The Parquet format is up to 2x faster to unload and consumes up to 6x less storage in Amazon S3, compared to text formats. This enables you to save data transformation and enrichment you have done in Amazon Redshift into your Amazon S3 data lake in an open format. You can then analyze your data with Redshift Spectrum and other AWS services such as Amazon Athena, Amazon EMR, and Amazon SageMaker.

You can specify one or more partition columns so that unloaded data is automatically partitioned into folders in your Amazon S3 bucket. For example, you can choose to unload your marketing data and partition it by year, month, and day columns. This enables your queries to take advantage of partition pruning and skip scanning non-relevant partitions, improving query performance and minimizing cost.

For more information, refer to the Amazon Redshift documentation.

Amazon Redshift data lake export is supported with Redshift release version 1.0.10480 or later. Refer to the AWS Region Table for Amazon Redshift availability.