Posted On: Oct 13, 2021

Amazon Virtual Public Cloud (VPC) is introducing three new features to make it faster, easier and more cost efficient to store and run analytics on your Amazon VPC Flow Logs. First, VPC Flow Logs can now be delivered to Amazon S3 in the Apache Parquet file format. Second, they can be stored in S3 with Hive-compatible prefixes. And third, your VPC Flow Logs can be delivered as hourly partitioned files. All of these features are available when you choose S3 as the destination for your VPC Flow Logs.

Queries on VPC Flow Logs stored in Apache Parquet format are more efficient as a result of the compact, columnar format of the Parquet files. In addition, you can save on query costs using tools such as Amazon Athena and Amazon Elastic Map Reduce (EMR), as your queries run faster and need to scan lesser volume of data using Parquet files. You can save up to 25% in S3 storage costs due to the better compression on the Parquet formatted files, and eliminate the need to build and manage an Apache Parquet conversion application. The Hive-compatible prefixes make it easier to discover and load new data into your Hive tools, and log files partitioned by the hour make it more efficient to query logs over specific time intervals.

To get started, create a new VPC Flow Log subscription with S3 as the destination and specify delivery options of Parquet format, Hive-compatible prefixes and/or hourly partitioned files. This functionality is available through the Amazon Web Services Management Console, the Amazon Command Line Interface (Amazon CLI), and the Amazon Software Development Kit (Amazon SDK). To learn more, please refer to the documentation and read the blog post. See CloudWatch Logs pricing page for pricing of log delivery in Apache Parquet format for VPC Flow Logs.