Stream data to an HTTP endpoint with Amazon Data Firehose
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more.
The value of data is time sensitive. Streaming data services can help you move data quickly from data sources to new destinations for downstream processing. For example, Amazon Data Firehose can reliably load streaming data into data stores like Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon OpenSearch Service, and Splunk.
We’re now expanding the Firehose delivery destinations to include generic HTTP endpoints. This enables you to use a fully managed delivery service to HTTP endpoints without building custom applications or worrying about operating and managing the data delivery infrastructure. Additionally, the HTTP endpoint enhancement opens up a number of key integration opportunities between Firehose and other AWS services such as Amazon DynamoDB or Amazon SNS or Amazon RDS using Amazon API Gateway’s AWS Service integrations.
All the existing Firehose features are fully supported, including AWS Lambda service integration, retry option, data protection on delivery failure, and cross-account and cross-Region data delivery. Traffic between Firehose and the HTTP endpoint is encrypted in transit using HTTPS. Firehose incorporates error handling, automatic scaling, transformation, conversion, aggregation, and compression functionality to help you accelerate the deployment of data streams across your organization. There is no additional cost to use this feature.
In this post, we walk through setting up a Firehose HTTP endpoint. Our example use case ingests data into a Firehose delivery stream and sends it to an Amazon API Gateway REST API endpoint that loads the data to a DynamoDB table using the DynamoDB AWS Service integration.
Configuring a delivery stream to an HTTP endpoint
To set this up, you provide the URL for your HTTP endpoint application or service, add an optional access key as a header to the HTTP calls made to your HTTP endpoint, and include optional key-value parameters. You can configure the endpoint on the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.
- On the console, under Analytics, choose Kinesis.
- Choose Create delivery stream.
- For Delivery stream name, enter a name.
- For Choose a source, select Direct PUT or other sources as the source using the Firehose PutRecord API.
- Leave the rest of the settings at their default and choose Next.
- Leave all settings at their default in Step 2: Process records and choose Next.
- For Destination, select HTTP Endpoint.
- For HTTP endpoint name, enter a name.
- For HTTP endpoint URL, enter your endpoint URL.
- For Access key, enter an access key (optional).
- For Content encoding, select Disabled.
You can optionally choose to encode and compress your request body before posting it to your HTTP endpoint.
- For Parameters, you can pass optional key-value parameters as needed.
In this post, we pass two key-value pairs that serve as inputs to an Amazon API Gateway integration with our Amazon DynamoDB API endpoint.
TableNameas the Key and
Table1as the Value.
Regionas the Key and
us-east-2as the Value.
- For Retry duration, leave at its default of 300 seconds.
- For S3 backup mode, select Failed data only.
- For S3 bucket, enter an S3 bucket as a backup for the delivery stream to store data that failed delivery to the HTTP API endpoint.
- Choose Next.
- Leave everything at its default on the Configure settings page and choose Next.
- Review your settings and choose Create delivery stream.
When the delivery stream is active, your source can start streaming data to it.
Testing the delivery stream
For this post, we use the Test with demo data feature available in Firehose to stream sample data to the newly created delivery stream.
- On the Firehose console, choose the delivery stream you just created.
- Choose Test with demo data.
The delivery stream delivers the demo data to the API Gateway REST API, that is configured as the HTTP endpoint. The integration endpoint reads the key-value header attributes to determine the DynamoDB table name and region, and loads the payload to the specified table.
Monitoring the delivery stream
You can view Amazon CloudWatch metrics on the Monitoring tab. Pertinent metrics to observe are Delivery to HTTP Endpoint data freshness and HTTP Endpoint delivery success.
This post demonstrated how to create a delivery stream to a HTTP endpoint, which eliminates the need to develop custom applications or manage the corresponding infrastructure. Firehose provides a fully managed service that helps you reduce complexities, so you can expand and accelerate the use of data streams throughout your organization.
About the Authors
Imtiaz (Taz) Sayed is the World Wide Tech Leader for Data Analytics at AWS. He is an ardent data engineer and relishes connecting with the data-analytics community. He likes roller-coasters, good heist movies, and is undecided between “The Godfather” and “The Shawshank Redemption” as the greatest movie of all time.
Masudur Rahaman Sayem is a Specialist Solution Architect for Analytics at AWS. He is passionate about distributed systems. He also likes to read, especially the classic comic books.