Amazon S3 Inventory adds Apache Parquet output format

Posted On: Dec 4, 2018

Customers can now get Amazon S3 Inventory reports in Apache Parquet file format. Amazon S3 Inventory provides flat file lists of objects and selected metadata for your bucket or shared prefixes. You can use S3 Inventory to list, audit, and report on the status of your objects, or to simplify and speed up business workflows and big data jobs.

Parquet is a columnar storage file format, similar to ORC (optimized row-columnar) and is available to any project in the Hadoop ecosystem regardless of the choice of data processing framework, data model, or programming language. The columnar format lets the you read, decompress, and process only the columns that are required for the current query. For querying S3 Inventory with AWS services such as Amazon Athena or Amazon Redshift Spectrum, or tools such as Apache Hive, Spark, HBase or Presto, we recommend configuring your S3 Inventory report in either Parquet or ORC for faster query performance and lower query costs.

Parquet format for S3 Inventory is available in all AWS commercial and AWS GovCloud Regions. You can get started by visiting the AWS Management Console or using the S3 API, CLI, or SDK to set your S3 Inventory configuration.

Amazon S3 Inventory adds Apache Parquet output format

Ending Support for Internet Explorer