Introducing Amazon EMR Serverless Streaming jobs for continuous processing on streaming data

Posted on: Jun 4, 2024

Amazon EMR Serverless is a serverless option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. We are excited to announce a new streaming job mode on Amazon EMR Serverless, enabling you to continuously analyze and process streaming data.
Streaming has become vital for businesses to gain continuous insights from data sources like sensors, IoT devices, and web logs. However, processing streaming data can be challenging due to requirements such as high availability, resilience to failures, and integration with streaming services. Amazon EMR Serverless Streaming jobs has built-in features to addresses these challenges. It offers high availability through multi-AZ (Availability Zone) resiliency by automatically failing over to healthy AZs. It also offers increased resiliency through automatic job retries on failures and log management features like log rotation and compaction, preventing the accumulation of log files that might lead to job failures. In addition, Amazon EMR Serverless Streaming jobs support processing data from streaming services like self-managed Apache Kafka clusters, Amazon Managed Streaming for Apache Kafka, and now is integrated with Amazon Kinesis Data Streams using a new built-in Amazon Kinesis Data Streams Connector, making it easier to build end-to-end streaming pipelines.

Amazon EMR Serverless Streaming jobs is generally available on EMR release versions 7.1.0 and later and in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), Europe (Stockholm, Paris, Frankfurt, Ireland, London), South America (São Paulo) and Asia Pacific (Tokyo, Seoul, Singapore, Mumbai, Sydney). To get started, visit the Amazon EMR Serverless Streaming jobs page in the Amazon EMR Serverless User Guide.