AWS Developer Tools Blog
Using Amazon Kinesis Firehose
Amazon Kinesis Firehose, a new service announced at this year’s re:Invent conference, is the easiest way to load streaming data into to AWS. Firehose manages all of the resources and automatically scales to match the throughput of your data. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift.
An example use for Firehose is to keep track of traffic patterns in a web application. To do that, we want to stream the records generated for each request to a web application with a record that contains the current page and the page being requested. Let’s take a look.
Creating the Delivery Stream
First, we need to create our Firehose delivery stream. Although we can do this through the Firehose console, let’s take a look at how we can automate the creation of the delivery stream with PowerShell.
In our PowerShell script, we need to set up the account ID and variables for the names of the resources we will create. The account ID is used in our IAM role to restrict access to just the account with the delivery stream.
Because Firehose will push our streaming data to S3, our script will need to make sure the bucket exists.
We also need to set up an IAM role that gives Firehose permission to push data to S3. The role will need access to the Firehose API and the S3 destination bucket. For the Firehose access, our script will use the AmazonKinesisFirehoseFullAccess managed policy. For the S3 access, our script will use an inline policy that restricts access to the destination bucket.
Now that the S3 bucket and IAM role are set up, we will create the delivery stream. We just need to set up an S3DestinationConfiguration object and call the New-KINFDeliveryStream cmdlet.
After the New-KINFDeliveryStream cmdlet is called, it will take a few minutes to create the delivery stream. We can use the Get-KINFDeliveryStream cmdlet to check the status. As soon as it is active, we can run the following cmdlet to test our stream.
This will send one record to our stream, which will be pushed to the S3 bucket. By default, delivery streams buffer data to either 5 MB or 5 minutes before pushing to S3, so check the bucket in 5 minutes.
Writing to the Delivery Stream
In an ASP.NET application, we can write an IHttpModule
so we know about every request. With an IHttpModule
, we can add an event handler to the BeginRequest event and inspect where the request is coming from and going to. Here is code for our IHttpModule
. The Init
method adds the event handler. The RecordRequest
method grabs the current URL and the request URL and sends that to the delivery stream.
Now we can navigate through our ASP.NET application and watch data flow into our S3 bucket.
What’s Next
Now that our data is flowing into S3, we have many options for what to do with that data. Firehose has built-in support for pushing our S3 data straight to Amazon Redshift, giving us lots of power for running queries and doing analytics. We could also set up event notifications to have Lambda functions or SQS pollers read the data getting pushed to Amazon S3 in real time.