Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt your data streams before loading, minimizing the amount of storage used and increasing security.
You can easily create a Firehose delivery stream from the AWS Management Console, configure it with a few clicks, and start ingesting streaming data from hundreds of thousands of data sources to your specified destinations. You can also configure your data streams to automatically convert the incoming data to open and standards based formats like Apache Parquet and Apache ORC before the data is delivered.
With Amazon Kinesis Data Firehose, there is no minimum fee or setup cost. You pay for the amount of data that you transmit through the service, if applicable, for converting data formats, and for Amazon VPC delivery and data transfer.
Easy to use
Amazon Kinesis Data Firehose provides a simple way to capture, transform, and load streaming data with just a few clicks in the AWS Management Console. You can quickly create a Firehose delivery stream, select the destinations, and start sending real-time data from hundreds of thousands of data sources simultaneously. The service takes care of stream management, including all the scaling, sharding, and monitoring needed to continuously load the data to destinations at the intervals you specify.
Integrated with AWS services and service providers
Amazon Kinesis Data Firehose is integrated with Amazon S3, Amazon Redshift, and Amazon OpenSearch Service. It can also deliver data to generic HTTP endpoints and directly to service providers like Datadog, New Relic, MongoDB, and Splunk. From the AWS Management Console, you can point Kinesis Data Firehose to the destinations of your choice and use your existing applications and tools to analyze streaming data.
Serverless built-in data transformation
Kinesis Data Firehose enables you to prepare your streaming data before it is loaded to data stores. You can easily convert raw streaming data from your data sources into formats like Apache Parquet and Apache ORC, without having to build your own data processing pipelines. You can also dynamically partition your streaming data before delivery to S3 using static or dynamically defined keys like “customer_id” or “transaction_id”, and deliver data grouped by these keys into unique S3 prefixes, making it easier for you to perform high performance, cost efficient analytics in S3 using Athena, EMR, and Redshift Spectrum. Learn more »
Near real time
Amazon Kinesis Data Firehose captures and loads data in near real time. It loads new data into your destinations within 60 seconds after the data is sent to the service. As a result, you can access new data sooner and react to business and operational events faster.
No ongoing administration
Amazon Kinesis Data Firehose is a fully managed service that automatically provisions, manages and scales compute, memory, and network resources required to process and load your streaming data. Once set up, Kinesis Data Firehose loads data streams into your destinations continuously as they arrive.
Pay only for what you use
With Amazon Kinesis Data Firehose, you pay only for the volume of data you transmit through the service, and if applicable, for data format conversion. You also pay for Amazon VPC delivery and data transfer when applicable. There are no minimum fees or upfront commitments.
How it works
Amazon Kinesis Data Firehose is a fully managed service, with no ongoing administration required. Kinesis Data Firehose manages all underlying infrastructure, storage, networking, and configuration needed to stream your data from your source to your destination. Below are examples of key use cases that our customers tackle using Amazon Kinesis Data Firehose.
Data Streaming into Data Lake and Data Warehouse
Kinesis Data Firehose enables high volume data ingestion into your Amazon S3 based data lake and data warehouse. You can configure Kinesis Data Firehose to convert your data into formats like Apache Parquet and Apache ORC required by your destination data stores, without having to build your own data processing pipelines. You can also dynamically partition your streaming data using well defined keys like “customer_id” or “transaction_id”. Kinesis Data Firehose groups data by these keys and delivers into key-unique S3 prefixes, making it easier for you to perform high performance, cost efficient analytics in S3 using Athena, EMR, and Redshift Spectrum.
Streaming Machine Learning Applications
You can build streaming Machine Learning (ML) applications with Kinesis Data Firehose. The transformation lambda feature in Kinesis Data Firehose can call upon ML models for analysis, ML inference endpoints for predictions to enrich your data streams as they are delivered to your destination.
Log and IoT Analytics
With Amazon Kinesis Data Firehose, you can capture data continuously from connected devices such as consumer appliances, embedded sensors, and TV set-top boxes. Kinesis Data Firehose loads the data into your specified destinations, enabling near real-time access to metrics, insights, and dashboards. You can also detect application errors as they happen and identify root cause by collecting, monitoring, and analyzing log data. You can easily install and configure the Amazon Kinesis Agent on your servers to automatically watch application and server log files and send the data to Kinesis Data Firehose. Kinesis Data Firehose continuously streams the log data to your destinations so you can visualize and analyze the data.
Kinesis Data Firehose supports several Security Information and Event Management (SIEM) tools like Splunk as a destination. This means that you can capture and send network traffic flow logs to Kinesis Data Firehose, which can transform, enrich, and load the data into Splunk. With this solution, you can monitor network security in real-time and alert when a potential threat arises.