Stream Processing – Amazon Managed Service for Apache Flink Features

Amazon Managed Service for Apache Flink makes it easier to build and run real-time stream processing applications using Apache Flink. Amazon Managed Service for Apache Flink provisions and configures your Flink clusters and orchestrates Flink job management. It sets up monitoring and alarms, offers auto scaling, and is architected for high availability (including Availability Zone failover). The service offers access to Apache Flink’s expressive APIs, and through Amazon Managed Service for Apache Flink Studio, you can interactively query data streams or launch stateful applications in only a few steps. With this managed service, you can get started with Apache Flink and quickly deploy and operate your data stream processing applications.

With Amazon Managed Service for Apache Flink, you have access to the full range of Apache Flink’s industry-leading capabilities, including low-latency and high-throughput data processing, exactly-once processing, and durable application state. With Amazon Managed Service for Apache Flink, you can deploy secure, compliant, and highly available applications. Amazon Managed Service for Apache Flink effortlessly replicates data and workloads across multiple Availability Zones, ensuring uninterrupted performance and reliability, and without having to pay for additional capacity.

Application development is easier with Amazon Managed Service for Apache Flink because the service supports Flink’s flexible APIs in Java, Scala, Python, and SQL. Amazon Managed Service for Apache Flink integrates with hundreds of data sources and destinations, such as Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, JDBC connectors, and custom connectors.

Stream processing applications using Apache Flink

Open source

Amazon Managed Service for Apache Flink includes open source libraries such as Apache Flink, Apache Beam, Apache Zeppelin, AWS SDK, and AWS service integrations. Apache Flink is a framework and engine for building highly available and accurate streaming applications. Apache Beam is a unified model for defining streaming and batch data processing applications that are run across multiple runtime engines. AWS SDKs help take the complexity out of coding for many AWS services by providing APIs in your preferred language, and they include AWS libraries, code samples, and documentation.

Flexible APIs

Amazon Managed Service for Apache Flink supports Flink’s flexible APIs in Java, Scala, Python, and SQL that are specialized for different use cases including stateful event processing, streaming ETL (extract, transform, and load), and real-time analytics. With prebuilt operators and analytics capabilities, you can build an Apache Flink streaming application in hours instead of months, and the libraries are extensible, so you can perform real-time processing for various use cases.

AWS service integrations

You can set up and integrate a data source or destination with minimal code. Use the Amazon Managed Service for Apache Flink libraries to integrate with the following AWS services:

Advanced integration capabilities

In addition to the AWS integrations, the Amazon Managed Service for Apache Flink libraries include more than 40 Apache Flink connectors and the ability to build custom integrations. With a few more lines of code, you can modify how each integration behaves with advanced functionality. You can also build custom integrations using a set of Apache Flink primitive types so that you can read and write from files, directories, sockets, or other sources accessed over the internet.

Exactly-once processing

Using Amazon Managed Service for Apache Flink, you can build applications where processed records affect the results exactly once, referred to as exactly-once processing. Even in the case of an application disruption, such as internal service maintenance or user-initiated application update, the service ensures all data is processed and there is no duplicate data.

Stateful processing

The service stores previous and in-progress computations, or state, in running application storage. Compare real-time and past results over any time period and achieve fast recovery during application disruptions. State is always encrypted and incrementally saved in running application storage.

Durable application backups

Create and delete durable application backups through a simple API call. Immediately restore your applications from the latest backup after a disruption, or restore your application to an earlier version.

ML integration

Amazon Managed Service for Apache Flink supports machine learning (ML) algorithms. You can create real-time applications for classification, clustering, evaluation, feature engineering recommendations, regressions, and statistics.

AWS Glue Schema Registry compatibility

Amazon Managed Service for Apache Flink is compatible with the AWS Glue Schema Registry. The Schema Registry helps you improve data quality and safeguard against unexpected changes using compatibility checks that govern schema evolution for your schemas on Amazon Managed Service for Apache Flink workloads connected to Apache Kafka, Amazon MSK, or Amazon Kinesis Data Streams, as either a source or sink connector.

Amazon Kinesis Data Analytics Studio

Stream Inspection and Visualization

Kinesis Data Analytics Studio supports sub-second queries with built-in visualizations. You can perform ad-hoc queries to quickly inspect your data stream and view results in seconds.

Simple Build-and-Run Environment

Studio notebooks provide a single-interface development experience for developing, debugging code, and running stream processing applications.

Process using SQL, Python, or Scala

Kinesis Data Analytics Studio supports SQL, Python, and Scala in the same development environment. Syntax highlighting, validation, and context-sensitive suggestions guide you within the notebook to interact with your data with built-in support for specific Apache Flink capabilities.

Rapid, Serverless Stream Processing Application Development

There are no servers to provision, manage, or scale. Just write code and pay for the resources your applications consume. Easily deploy your code in the notebook to a continuously running stream processing application with autoscaling and durable state.

Open Source

Kinesis Data Analytics Studio runs on and produces Apache Flink applications used in production and Apache Zeppelin notebooks provide a familiar, easy-to-use experience for authoring streaming applications in a language of choice.

Integrates with AWS Glue Data Catalog

AWS Glue Data Catalog is a persistent metadata store that serves as a central repository containing table definitions. You can use the AWS Glue Data Catalog to quickly discover and search across multiple AWS datasets. Kinesis Data Analytics Studio is compatible with the AWS Glue Data Catalog, where you can define the schema for your source and destination tables.

Get started with Amazon Kinesis Data Analytics

Calculate your costs

Visit the Amazon Kinesis Data Analytics pricing page.

Review the getting-started guide

Learn how to use Amazon Kinesis Data Analytics in the step-by-step guide for SQL and Apache Flink.

Start building streaming applications

Build your streaming application from the Amazon Kinesis Data Analytics console.

Amazon Managed Service for Apache Flink Features