Amazon Managed Service for Apache Flink Documentation

Amazon Managed Service for Apache Flink makes it easier to build and run real-time stream processing applications using Apache Flink. Amazon Managed Service for Apache Flink provisions and configures your Flink clusters and orchestrates Flink job management. It sets up monitoring and alarms, offers auto scaling, and is architected for high availability (including Availability Zone failover). The service offers access to Apache Flink’s expressive APIs, and through Amazon Managed Service for Apache Flink Studio, you can interactively query data streams or launch stateful applications in only a few steps. With this managed service, you can get started with Apache Flink and quickly deploy and operate your data stream processing applications.

With Amazon Managed Service for Apache Flink, you have access to Apache Flink’s capabilities, including low-latency and high-throughput data processing, exactly-once processing, and durable application state. Amazon Managed Service for Apache Flink is designed to help you deploy secure, compliant, and highly available applications. Amazon Managed Service for Apache Flink replicates data and workloads across multiple Availability Zones. 

Application development is easier with Amazon Managed Service for Apache Flink because the service supports Flink’s flexible APIs in Java, Scala, Python, and SQL. Amazon Managed Service for Apache Flink integrates with data sources and destinations, such as Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, JDBC connectors, and custom connectors. 

Open source

Amazon Managed Service for Apache Flink includes open source libraries such as Apache FlinkApache BeamApache ZeppelinAWS SDK, and AWS service integrations. Apache Flink is a framework and engine designed to build available and accurate streaming applications. Apache Beam is a unified model for defining streaming and batch data processing applications that are run across multiple runtime engines. AWS SDKs help take the complexity out of coding for many AWS services by providing APIs in your preferred language, and they include AWS libraries, code samples, and documentation.

Flexible APIs

Amazon Managed Service for Apache Flink supports Flink’s flexible APIs in Java, Scala, Python, and SQL that are specialized for different use cases including stateful event processing, streaming ETL (extract, transform, and load), and real-time analytics. With prebuilt operators and analytics capabilities, you can build an Apache Flink streaming application, and the libraries are extensible, so you can perform real-time processing for various use cases.

AWS service integrations

You can set up and integrate a data source or destination with minimal code. Use the Amazon Managed Service for Apache Flink libraries to integrate with the following AWS services:  

Advanced integration capabilities

In addition to the AWS integrations, the Amazon Managed Service for Apache Flink libraries include more than 40 Apache Flink connectors and the ability to build custom integrations. With a few more lines of code, you can modify how each integration behaves with advanced functionality. You can also build custom integrations using a set of Apache Flink primitive types so that you can read and write from files, directories, sockets, or other sources accessed over the internet.

Exactly-once processing

Using Amazon Managed Service for Apache Flink, you can build applications where processed records affect the results exactly once, referred to as exactly-once processing. Even in the case of an application disruption, such as internal service maintenance or user-initiated application update, the service ensures all data is processed and there is no duplicate data.

Stateful processing

The service stores previous and in-progress computations, or state, in running application storage. Compare real-time and past results over any time period and achieve fast recovery during application disruptions. State is always encrypted and incrementally saved in running application storage.

Durable application backups

Create and delete durable application backups through a simple API call. Restore your applications from the latest backup after a disruption, or restore your application to an earlier version. 

ML integration

Amazon Managed Service for Apache Flink supports machine learning (ML) algorithms. You can create real-time applications for classification, clustering, evaluation, feature engineering recommendations, regressions, and statistics. 

AWS Glue Schema Registry compatibility

Amazon Managed Service for Apache Flink is compatible with the AWS Glue Schema Registry. The Schema Registry helps you improve data quality and safeguard against unexpected changes using compatibility checks that govern schema evolution for your schemas on Amazon Managed Service for Apache Flink workloads connected to Apache Kafka, Amazon MSK, or Amazon Kinesis Data Streams, as either a source or sink connector.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html.  This information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.