Posted On: Aug 31, 2021
Today we announced Dynamic Partitioning in Amazon Kinesis Data Firehose. With Dynamic Partitioning, you can continuously partition streaming data in Kinesis Data Firehose using keys within data like “customer_id” or “transaction_id” and deliver data grouped by these keys into corresponding Amazon Simple Storage Service (Amazon S3) prefixes, making it easier for you to run high performance, cost-efficient analytics on streaming data in Amazon S3 using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Partitioning your data minimizes the amount of data scanned, optimizing performance and reducing costs of your analytics queries on Amazon S3, and increasing granular access to data. Traditionally, customers use Kinesis Data Firehose delivery streams to capture and load their data streams into Amazon S3. To partition a streaming data set for Amazon S3-based analytics, customers would need to run partitioning applications between Amazon S3 buckets prior to making the data available for analysis, which could become complicated or costly.
Now with Dynamic Partitioning, Kinesis Data Firehose will continuously group data in-transit by dynamically or statically defined data keys, and deliver to individual Amazon S3 prefixes by key. This will reduce time-to-insight by minutes or hours, reducing costs and simplifying architectures. Along with Apache Parquet and Apache ORC format conversion features, this feature makes Kinesis Data Firehose the best place to capture, prepare, and load analytics-ready streaming data to Amazon S3.