Stream processing applications using Apache Flink
Amazon Kinesis Data Analytics includes open-source libraries such as Apache Flink, Apache Beam, Apache Zeppelin, AWS SDK, and AWS service integrations. Apache Flink is an open-source framework and engine for building highly available and accurate streaming applications. Apache Beam is an open-source, unified model for defining streaming and batch data processing applications executed across multiple execution engines. The AWS software development kits (SDKs) help take the complexity out of coding for many AWS services by providing application programming interfaces (APIs) in your preferred language and includes the AWS libraries, code samples, and documentation.
Kinesis Data Analytics offers flexible APIs in Java, Scala, Python, and SQL specialized for different use cases including stateful event processing, streaming ETL, and real-time analytics. Pre-built operators and analytics capabilities enable you to build an Apache Flink streaming application in hours instead of months. The Kinesis Data Analytics libraries are extensible, so you can perform real-time processing for a wide variety of use cases.
AWS Service Integrations
You can setup and integrate a data source or destination with minimal code. You can use the Amazon Kinesis Data Analytics libraries to integrate with Amazon Simple Storage Service (S3), Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon OpenSearch Service, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon CloudWatch, and AWS Glue Schema Registry.
Advanced integration capabilities
In addition to the AWS integrations, the Kinesis Data Analytics libraries include more than 10 Apache Flink connectors and the ability to build custom integrations. With a couple more lines of code, you can modify how each integration behaves with advanced functionality. You can also build custom integrations using a set of Apache Flink primitives enabling you to read and write from files, directories, sockets, or other sources accessed over the Internet.
Compatible with AWS Glue Schema Registry
Kinesis Data Analytics for Apache Flink is compatible with the AWS Glue Schema Registry. This serverless AWS Glue feature lets you to validate and control the evolution of streaming data using registered Apache Avro schemas, at no additional charge. The Schema Registry helps you manage your schemas on Kinesis Data Analytics for Apache Flink workloads connected to Apache Kafka, Amazon Managed Streaming for Apache Kafka (MSK), or Amazon Kinesis Data Streams, as either a source or sink. When data streaming applications are integrated with the Schema Registry, you can improve data quality and safeguard against unexpected changes using compatibility checks that govern schema evolution.
Exactly Once Processing
Use Apache Flink in Kinesis Data Analytics to build applications where processed records affect the results exactly once, referred to as exactly once processing. Even in the case of an application disruption, like internal service maintenance or user initiated application update, the service will ensure all data is processed and there is no duplicate data.
The service stores previous and in-progress computations, or state, in running application storage. Compare real-time and past results over any time period and achieve fast recovery during application disruptions. State is always encrypted and incrementally saved in running application storage.
Durable Application Backups
You can create and delete durable application backups through a simple API call. Immediately restore your applications from the latest backup after a disruption, or restore your application to an earlier version.
Amazon Kinesis Data Analytics Studio
Stream Inspection and Visualization
Kinesis Data Analytics Studio supports sub-second queries with built-in visualizations. You can perform ad-hoc queries to quickly inspect your data stream and view results in seconds.
Simple Build-and-Run Environment
Studio notebooks provide a single-interface development experience for developing, debugging code, and running stream processing applications.
Process using SQL, Python, or Scala
Kinesis Data Analytics Studio supports SQL, Python, and Scala in the same development environment. Syntax highlighting, validation, and context-sensitive suggestions guide you within the notebook to interact with your data with built-in support for specific Apache Flink capabilities.
Rapid, Serverless Stream Processing Application Development
There are no servers to provision, manage, or scale. Just write code and pay for the resources your applications consume. Easily deploy your code in the notebook to a continuously running stream processing application with autoscaling and durable state.
Kinesis Data Analytics Studio runs on and produces Apache Flink applications used in production and Apache Zeppelin notebooks provide a familiar, easy-to-use experience for authoring streaming applications in a language of choice.
Integrates with AWS Glue Data Catalog
AWS Glue Data Catalog is a persistent metadata store that serves as a central repository containing table definitions. You can use the AWS Glue Data Catalog to quickly discover and search across multiple AWS datasets. Kinesis Data Analytics Studio is compatible with the AWS Glue Data Catalog, where you can define the schema for your source and destination tables.
Kinesis Data Analytics SQL applications
For new projects, we recommend you use the new Kinesis Data Analytics Studio over Kinesis Data Analytics for SQL Applications. Kinesis Data Analytics Studio combines ease of use with advanced analytical capabilities, enabling you to build sophisticated stream processing applications in minutes.
Support for Standard SQL
Kinesis Data Analytics supports standard ANSI SQL, so all you need is familiarity with SQL.
Integrated Input and Output
Kinesis Data Analytics integrates with Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose so you can readily ingest streaming data. Just point Kinesis Data Analytics at the input stream and it will automatically read the data, parse it, and make it available for processing. You can emit processed results to other AWS services including Amazon S3, Amazon Redshift, and Amazon OpenSearch Service through Kinesis Data Firehose. You can also send output data to Amazon Kinesis Data Streams to build advanced stream processing pipelines.
Console-based SQL Editor
Use a console-based editor to build SQL queries using streaming data operations like sliding time-window averages. You can also view streaming results and errors using live data to debug or further refine your script interactively.
Easy-to-Use Schema Editor
Kinesis Data Analytics provides an easy-to-use schema editor to discover and edit input data structure. The wizard automatically recognizes standard data formats such as JSON and CSV. It infers the structure of the input data to create a baseline schema, which you can further refine using the schema editor.
Pre-built SQL Templates
The interactive SQL editor comes bundled with a collection of SQL templates providing baseline SQL code for the most common types of operations such as aggregation, per-event transformation, and filtering. You simply select the template appropriate for your analytics task and then edit the provided code using the SQL editor to customize it for your specific use case.
Advanced Stream Processing Functions
Kinesis Data Analytics offers functions optimized for stream processing so you can easily perform advanced analytics such as anomaly detection and top-K analysis on your streaming data.
Get started with Amazon Kinesis Data Analytics
Visit the Amazon Kinesis Data Analytics pricing page.
Learn how to use Amazon Kinesis Data Analytics in the step-by-step guide for SQL and Apache Flink.
Build your streaming application from the Amazon Kinesis Data Analytics console.