Analyze Streaming Data from Amazon Kinesis with Amazon Elastic MapReduce (EMR)

Posted on: Feb 20, 2014

We are pleased to announce the release of the Amazon Elastic MapReduce (Amazon EMR) Connector to Amazon Kinesis. Kinesis can collect data from hundreds of thousands of sources, such as web site click-streams, marketing and financial information, manufacturing instrumentation, social media and more. This connector enables batch processing of data in Kinesis streams with familiar Hadoop ecosystem tools such as Hive, Pig, Cascading, and standard MapReduce. You can now analyze data in Kinesis streams without having to write, deploy and maintain any independent stream processing applications.

You can use this connector, for example, to write a SQL query using Hive against a Kinesis stream or to build reports that join and process Kinesis stream data with multiple data sources such as Amazon Dynamo DB, Amazon S3 and HDFS. You can build reliable and scalable ETL processes that filter and archive Kinesis data into permanent data stores including Amazon S3, Amazon DynamoDB, or Amazon Redshift.

To facilitate end-to-end log processing scenarios using Kinesis and EMR, we have created a Log4J Appender that streams log events directly into a Kinesis stream, making the log entries available for processing in EMR. You can get started today by launching a new EMR cluster and using the code samples provided in the tutorials and FAQs. If you’re new to Kinesis you can learn more by visiting the Kinesis detail page.