Introducing the Amazon Kinesis Data Streams Apache Spark Structured Streaming Connector for Amazon EMR

Posted on: May 24, 2024

We are excited to announce the launch of the Amazon Kinesis Data Streams Connector for Spark Structured Streaming on Amazon EMR. The new connector makes it easy for you to build real-time streaming applications and pipelines that consume Amazon Kinesis Data Streams using Apache Spark Structured Streaming. Starting Amazon EMR 7.1, the connector comes pre-packaged on Amazon EMR on EKS, EMR on EC2 and EMR Serverless. Now, you do not need to build or download any packages and can focus on building your business logic using the familiar and optimized Spark Data Source APIs when consuming data from your Kinesis data streams.

Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store streaming data at massive scale. Amazon EMR is the cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using Apache Spark and other open-source frameworks. The new Amazon Kinesis Data Streams Connector for Apache Spark is faster, more scalable, and fault-tolerant than alternative open-source options. The connector also supports Enhanced Fan-out consumption with dedicated read throughput. To learn more and see a code example, go to Build Spark Structured Streaming applications with the open source connector for Amazon Kinesis Data Streams.
 

The connector is available starting Amazon EMR 7.1 on EMR on EKS, EMR on EC2 and EMR Serverless in all AWS Regions where Amazon EMR is available. To get started go to the Amazon EMR Console You can also learn more in the Using the Spark structured streaming Amazon Kinesis Data Streams connector section in the Amazon EMR Developer Guide.