Category: Amazon Kinesis*

Implement a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm

Rahul Bhartia is an AWS Solutions Architect

Streams of data are becoming ubiquitous today – clickstreams, log streams, event streams, and more. The need for real-time processing of high-volume data streams is pushing the limits of traditional data processing infrastructures. Building a clickstream monitoring system, for example, where data is in the form of a continuous clickstream rather than discrete data sets, requires the use of continuous processing rather than ad-hoc, one-time queries.

Developers can use Apache Storm and Amazon Kinesis to quickly and cost-effectively build an application that continuously processes very high volumes of streaming data. To help developers integrate Apache Storm with Amazon Kinesis, earlier this year we launched the Amazon Kinesis Storm Spout. Last week we released an update to the Spout to support Ack/Fail semantics. With this update, the Spout now re-emits failed messages up to the configured retry limit, making it easier to build reliable data processing applications. The updated Amazon Kinesis Storm Spout is available on Github.

Along with the updated Amazon Kinesis Storm Spout, we published a white paper that outlines a reference architecture for building a real-time, sliding-window visualization over clickstream data using Amazon Kinesis and Apache Storm. The white paper documents a reference system that demonstrates everything from ingestion, processing and storing to visualization of the data in real time. You can launch the entire application shown in the diagram below in one click using the template.

Check out the white paper to learn how the entire stack works all the way from ingestion to visualization, and look at our github repository  to view further instructions on how to build and deploy it yourself.


Hosting Amazon Kinesis Applications on AWS Elastic Beanstalk

Ian Meyers is a Solutions Architecture Senior Manager with AWS

Amazon Kinesis provides a scalable and highly available platform for ingesting data from thousands of clients. Once data is available on a Kinesis stream, you can build applications to process the data using the Kinesis Client Library (KCL). KCL provides a framework for managing many of the complexities that accompany designing stream-processing applications. For example, the KCL will automatically distribute workers to process each shard in a Kinesis stream. It will manage this in a single JVM or across a fleet of instances. Using the KCL, you can build elastic, fault-tolerant, scalable stream-processing applications. Once you’ve built such an application, you’ll want a simple way to deploy it.

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services. Simply upload your application archive, and AWS Elastic Beanstalk automatically deploys it across multiple Availability Zones with configuration of AutoScaling. It also provides load balancing and monitors application health. These features make AWS Elastic Beanstalk a great platform for running Amazon Kinesis Applications. This article shows you how to host Kinesis applications in AWS Elastic Beanstalk.

You can now download an Amazon Elastic Beanstalk Application for Java/Tomcat, which lets you deploy your processing logic as an AWS Elastic Beanstalk-managed application. Simply build your Amazon Kinesis application as you normally would and expose the ability to start the Worker implementation using a publicly accessible run() method. Elastic Beanstalk handles the rest, including building the AutoScaling configuration and distributing your application across multiple Availability Zones. If an instance crashes, AWS Beanstalk replaces it. As you add shards to your stream, AWS Elastic Beanstalk scales your Kinesis Application based on CPU usage. As with all AWS Elastic Beanstalk applications, you can configure and customize the application and the underlying resources as required.