Amazon Kinesis Data Analytics Documentation

Amazon Kinesis Data Analytics is designed to analyze streaming data in real time. Using templates and built-in operators, you can build queries and stream real-time applications. Amazon Kinesis Data Analytics is designed to set up the resources to help you run your applications and scale to handle any volume of incoming data.

Serverless

Designed so that you do not need to set up and manage a complex infrastructure for high availability and stateful processing. Amazon Kinesis Data Analytics is serverless and is designed to take care of everything required to run your application. This includes provisioning the infrastructure to process streaming data.

Processing latency

Amazon Kinesis Data Analytics is designed to deliver sub-second processing latencies so you can generate real-time alerts, dashboards, and actionable insights.

Open source

Amazon Kinesis Data Analytics is designed to include open source libraries such as Apache Flink, Apache Beam, Apache Zeppelin, AWS SDK, and AWS service integrations. 

Flexible APIs

Flexible APIs are provided in Java, Scala, Python, and SQL that are specialized for different use cases including stateful event processing, streaming ETL, and real-time analytics. Pre-built operators and analytics capabilities help you to build an Apache Flink streaming application. The many Amazon Kinesis Data Analytics libraries help you to perform real-time processing for a wide variety of use cases.

Integration with AWS services

You can use the Amazon Kinesis Data Analytics libraries to integrate with certain other AWS services.

Advanced integration capabilities

In addition to the AWS integrations, the Amazon Kinesis Data Analytics libraries include connectors from Apache Flink and the ability to build custom integrations. With some additional code, you can modify how each integration behaves with advanced functionality. Also, you can build custom integrations using a set of Apache Flink primitives that enable you to read and write from files, directories, sockets, or other sources that you can access over the Internet.

Exactly once processing

You can use Apache Flink in Amazon Kinesis Data Analytics to build applications whose processed records affect the results exactly once, referred to as exactly once processing. This means that even in the case of an application disruption, like internal service maintenance or user-initiated application update, the service is designed so that all data is processed and there is no duplicate data.

Stateful processing

The service is designed to store previous and in-progress computations, or state, in running application storage. This helps you to compare real-time and past results over any time period and helps you provide fast recovery during application disruptions. State is designed to be encrypted and incrementally saved in running application storage.

Durable application backups

The service is designed so that you can create and delete application backups through an API call, you can restore your applications from the latest backup after a disruption, or you can restore your application to an earlier version.

Amazon Kinesis Data Analytics Studio

Stream inspection and visualization

Kinesis Data Analytics Studio is designed to support sub-second queries with built-in visualizations. You can perform ad-hoc queries to inspect your data stream and view results in seconds.

Simple build-and-run environment

Studio notebooks are designed to provide a single-interface development experience for developing, debugging code, and running stream processing applications. 

Process using SQL, Python, or Scala

Kinesis Data Analytics Studio is designed to support SQL, Python, and Scala in the same development environment. Syntax highlighting, validation, and context-sensitive suggestions are designed to guide you within the notebook to interact with your data with built-in support for Apache Flink specific capabilities.

Serverless, rapid development of stream processing applications

Kinesis Data Analytics Studio is designed so that there are no servers to provision, manage, or scale, allowing you to just write code and pay for the resources your applications consume. You can deploy your code in the notebook to a running stream processing application with autoscaling and durable state. 

Open source

Kinesis Data Analytics Studio is designed to run on and produce Apache Flink applications used in production and Apache Zeppelin.

Kinesis Data Analytics SQL applications

For new projects, we recommend that you use the new Kinesis Data Analytics Studio over Kinesis Data Analytics for SQL Applications. Kinesis Data Analytics Studio is designed to combine ease of use with advanced analytical capabilities, helping you to build sophisticated stream processing applications.

Support for standard SQL

Amazon Kinesis Data Analytics supports standard ANSI SQL.

Input and output integration

Amazon Kinesis Data Analytics is designed to integrate with Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose so that you can ingest streaming data by pointing Amazon Kinesis Data Analytics at the input stream. You can emit processed results to other AWS services through Amazon Kinesis Data Firehose. You can also send output data to Amazon Kinesis Data Streams to build advanced stream processing pipelines.

Console-based SQL editor

The service is designed to that you get a console-based editor to build SQL queries using streaming data operations like sliding time-window averages, and you can view streaming results and errors using live data to debug or further refine your script interactively.

Schema editor

Amazon Kinesis Data Analytics is designed to provide an easy-to-use schema editor to discover and edit the structure of the input data. The wizard is designed to recognize standard data formats such as JSON and CSV, and infer the structure of the input data to create a baseline schema, which you can further refine using the schema editor.

Pre-built SQL templates

The interactive SQL editor comes bundled with a collection of SQL templates that provide baseline SQL code for the most common types of operations such as aggregation, per-event transformation, and filtering. You can select the template appropriate for your analytics task and then edit the provided code using the SQL editor to customize it for your specific use case.

Advanced stream processing functions

Amazon Kinesis Data Analytics offers functions that are designed for stream processing so that you can easily perform advanced analytics such as anomaly detection and top-K analysis on your streaming data.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.