General
Interactive analysis enables streaming data exploration in real time. With ad hoc queries or programs, you can inspect streams from Amazon MSK or Amazon Kinesis Data Streams and visualize how data looks like within those streams. For example, you can view how a real- time metric that computes the average over a time window behaves and send the aggregated data to a destination of your choice. Interactive analysis also helps with iterative development of stream processing applications. The queries you build will continuously update as new data arrives. With Kinesis Data Analytics Studio you can deploy these queries to run continuously with autoscaling and durable state backups enabled.
Using Apache Beam to create your Kinesis Data Analytics application is very similar to getting started with Apache Flink. Please follow the instructions in the question above and be sure to install any components necessary for applications to run on Apache Beam, per the instructions in the Developer Guide. Note that Kinesis Data Analytics supports Java SDK’s only when running on Apache Beam.
You can get started from the Amazon Kinesis Data Analytics console and create a new Studio notebook. Once you start the notebook, you can open it in Apache Zeppelin to immediately write code in SQL, Python, or Scala. You can interactively develop applications using the notebook interface for Amazon Kinesis Data Streams, Amazon MSK, and Amazon S3 using built-in integrations, and various other sources with custom connectors. You can use all the operators that Apache Flink supports in Flink SQL and the Table API to perform ad hoc data stream querying and develop your stream processing application. Once you are ready, with a few clicks, you can easily promote your code to a continuously running stream processing application with autoscaling and durable state.
Yes, using Apache Flink DataStream Connectors, Amazon Kinesis Data Analytics for Apache Flink applications can use AWS Glue Schema Registry, a serverless feature of AWS Glue. You can integrate Apache Kafka/Amazon MSK and Amazon Kinesis Data Streams, as a sink or a source, with your Amazon Kinesis Data Analytics for Apache Flink workloads. Visit the Schema Registry user documentation to get started and to learn more.
Key concepts
Managing applications
- Monitoring Kinesis Data Analytics in the Amazon Kinesis Data Analytics for Apache Flink Developer Guide.
- Monitoring Kinesis Data Analytics in the Amazon Kinesis Data Analytics for Studio Developer Guide.
- Monitoring Kinesis Data Analytics in the Amazon Kinesis Data Analytics for SQL Developer Guide.
- Granting Permissions in the Amazon Kinesis Data Analytics for Apache Flink Developer Guide.
- Granting Permissions in the Amazon Kinesis Data Analytics Studio Developer Guide.
- Granting Permissions in the Amazon Kinesis Data Analytics for SQL Developer Guide.
Pricing and billing
You are charged an hourly rate based on the number of Amazon Kinesis Processing Units (or KPUs) used to run your streaming application. A single KPU is a unit of stream processing capacity comprised of 1 vCPU compute and 4 GB memory. Amazon Kinesis Data Analytics automatically scales the number of KPUs required by your stream processing application as the demands of memory and compute vary in response to processing complexity and the throughput of streaming data processed.
For Apache Flink and Apache Beam applications, you are charged a single additional KPU per application for application orchestration. Apache Flink and Apache Beam applications are also charged for running application storage and durable application backups. Running application storage is used for stateful processing capabilities in Amazon Kinesis Data Analytics and is charged per GB-month. Durable application backups are optional, charged per GB-month, and provide a point-in-time recovery point for applications.
For Amazon Kinesis Data Analytics Studio, in development or interactive mode, you are charged an additional KPU for application orchestration and one for interactive development. You are also charged for running application storage. You are not charged for durable application backups.
Building Apache Flink applications
Authoring application code for applications using Apache Flink
DataStream <GameEvent> rawEvents = env.addSource(
New KinesisStreamSource(“input_events”));
DataStream <UserPerLevel> gameStream =
rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId,
event.gameMetadata.levelId,event.userId));
gameStream.keyBy(event -> event.gameId)
.keyBy(1)
.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
.apply(...) - > {...};
gameStream.addSink(new KinesisStreamSink("myGameStateStream"));
- Streaming data sources: Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams Destinations, or sinks: Amazon Kinesis Data Streams
- Amazon Kinesis Data Firehose, Amazon DynamoDB, Amazon Elasticsearch Service, and Amazon S3 (through file sink integrations)
Apache Flink also includes other connectors including Apache Kafka, Apache Casssandra, Elasticsearch, and more.
Yes. You can use Kinesis Data Analytics Apache Flink applications to replicate data between Amazon Kinesis Data Streams, Amazon MSK, and other systems. An example provided in our documentation demonstrates how to read from one Amazon MSK topic and write to another.
and 1.11 (recommended). Apache Flink 1.11 in Kinesis Data Analytics supports Java Development Kit version 11, Python 3.7 and Scala 2.1.2. You can find more information in the Create an Application section of the AWS Developer Guide.
Apache Flink 1.13 in Kinesis Data Analytics supports Java Development Kit version 11, Python 3.8 and Scala 2.12. You can find more information in the Create an Application section of the AWS Developer Guide.
Yes, supports streaming applications built using Apache Beam Java SDK version 2.23. You can build Apache Beam streaming applications in Java and run them using Apache Flink 1.8 on Amazon Kinesis Data Analytics, Apache Spark running on-premises, and other execution engines supported by Apache.
Apache Beam is an open-source, unified model for defining streaming and batch data processing applications executed across multiple execution engines.
Building Amazon Kinesis Analytics Studio applications
Q: How do I develop a Studio application?
You can start from the Amazon Kinesis Data Analytics Studio, Amazon Kinesis Data Streams, or Amazon MSK consoles with a few clicks to launch a serverless notebook to immediately query data streams and perform interactive data analytics.
Interactive data analytics: You can write code in the notebook in SQL, Python, or Scala to interact with your streaming data, with query response times in seconds. You can use built-in visualizations to explore the data and view real-time insights on your streaming data from within your notebook, and easily develop stream processing applications powered by Apache Flink.
Once your code is ready to run as a production application, you can transition with a single click to a stream processing application that processes GBs of data per second, without servers.
Stream processing application: Once you are ready to promote your code to production, you can build your code by clicking. You can click on ‘Deploy as stream processing application’ in the notebook interface or issue a single command in the CLI, and Studio takes care of all the infrastructure management necessary for you to run your stream processing application at scale, with autoscaling and durable state enabled, just as in an Amazon Kinesis Data Analytics for Apache Flink application.
Q: What does my application code look like?
You can write code in the notebook in your preferred language of SQL, Python, or Scala using Apache Flink’s Table API. The Table API is a high-level abstraction and relational API that supports a superset of SQL’s capabilities. It offers familiar operations such as select, filter, join, group by, aggregate, etc., along with stream specific concepts like windowing. You use %<interpreter> to specify the language to be used in a section of the notebook, and can easily switch between languages. Interpreters are Apache Zeppelin plug-ins enabling developers to specify a language or data processing engine for each section of the notebook. You can also build user defined functions and reference them to improve code functionality.
Q: What SQL operations are supported?
You can perform SQL operations such as Scan and Filter (SELECT, WHERE), Aggregations (GROUP BY, GROUP BY WINDOW,HAVING), Set (UNION, UNIONALL, INTERSECT, IN, EXISTS), Order (ORDER BY, LIMIT), Joins (INNER, OUTER, Timed Window –BETWEEN, AND, joining with temporal tables – tables that track changes over time), Top N, deduplication, and pattern recognition. Some of these queries such as GROUP BY, OUTER JOIN, and Top N are “results updating” for streaming data, which means that the results are continuously updating as the streaming data is processed. Other DDL statements such as CREATE, ALTER, and DROP are also supported. For a complete list of queries and samples, see https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html.
Q: How are Python and Scala supported?
Apache Flink’s Table API supports Python and Scala through language integration using Python strings and Scala expressions. The operations supported are very similar to the SQL operations supported, including select, order, group, join, filter, and windowing. A full list of operations and samples are included here: https://ci.apache.org/projects/flink/flink-docs- release-1.11/dev/table/tableApi.html
Q: What versions of Apache Flink and Apache Zeppelin are supported?
Kinesis Data Analytics Studio supports Apache Flink 1.11 and Apache Zeppelin 0.9.
Q: What integrations are supported in a Kinesis Data Analytics Studio application by default?
- Data sources: Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams, Amazon S3
- Destinations, or sinks: Amazon MSK, Amazon Kinesis Data Streams, and Amazon S3
Q: Are custom integrations supported?
You can configure additional integrations with a few additional steps and lines of Apache Flink code (Python, Scala, or Java) to define connections with all Apache Flink supported integrations including destinations such as Amazon OpenSearch Service, Amazon ElastiCache for Redis, Amazon Aurora, Amazon Redshift, Amazon DynamoDB, Amazon Keyspaces, and more. You can attach executables for these custom connectors when you create or configure your Studio application.
Q: Should I develop with Kinesis Data Analytics Studio or Kinesis Data Analytics SQL?
We recommend getting started with Kinesis Data Analytics Studio as it offers a more comprehensive stream processing experience with exactly-once processing. Kinesis Data Analytics Studio offers stream processing application development in your language of choice (SQL, Python, and Scala), scales to GB/s of processing, supports long running computations over hours or even days, performs code updates within seconds, handles multiple input streams, and works with a variety of input streams including Amazon Kinesis Data Streams and Amazon MSK.
Building Kinesis Data Analytics SQL applications
For new projects, we recommend that you use the new Kinesis Data Analytics Studio over Kinesis Data Analytics for SQL Applications. Kinesis Data Analytics Studio combines ease of use with advanced analytical capabilities, enabling you to build sophisticated stream processing applications in minutes.
Configuring input for SQL applications
Authoring application code for SQL applications
- Always use a SELECT statement in the context of an INSERT statement. When you select rows, you insert results into another in-application stream.
- Use an INSERT statement in the context of a pump.
- You use a pump to make an INSERT statement continuous, and write to an in-application stream.
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
ticker_symbol VARCHAR(4),
change DOUBLE,
price DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM ticker_symbol, change, price
FROM "SOURCE_SQL_STREAM_001";
Configuring destinations in SQL applications
Comparison to other stream processing solutions
Service Level Agreement
Q: What does the Amazon Kinesis Data Analytics SLA guarantee?
Our Amazon Kinesis Data Analytics SLA guarantees a Monthly Uptime Percentage of at least 99.9% for Amazon Kinesis Data Analytics.
Q: How do I know if I qualify for an SLA Service Credit?
You are eligible for an SLA credit for Amazon Kinesis Data Analytics under the Amazon Kinesis Data Analytics SLA if more than one Availability Zone in which you are running a task, within the same region has a Monthly Uptime Percentage of less than 99.9% during any monthly billing cycle. For full details on all of the terms and conditions of the SLA, as well as details on how to submit a claim, please see the Amazon Kinesis SLA details page.
Get started with Amazon Kinesis Data Analytics

Learn how to use Amazon Kinesis Data Analytics in the step-by-step guide for SQL or Apache Flink.

Build your first streaming application from the Amazon Kinesis Data Analytics console.