Posted On: Nov 3, 2016
Apache Flink is a streaming dataflow engine that makes it easy to run real-time stream processing on high-throughput data sources. It supports event time semantics for out of order events, exactly-once semantics, backpressure control, and APIs optimized for writing both streaming and batch applications. Additionally, Flink has connectors for Amazon Kinesis Streams, Apache Kafka, Elasticsearch, the Twitter Streaming API, Cassandra, and can access data in Amazon S3 (with EMRFS) and HDFS.
You can create an Amazon EMR cluster with release 5.1.0 by choosing release label “emr-5.1.0” from the AWS Management Console, AWS CLI, or SDK. You can specify Flink, Zeppelin, and HBase to install these applications on your cluster. Please visit the Amazon EMR documentation for more information about release 5.1.0, Flink 1.1.3, Zeppelin 0.6.2, and HBase 1.2.3.