General

Q: What is Amazon Kinesis Data Analytics?
Amazon Kinesis Data Analytics is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. Amazon Kinesis Data Analytics reduces the complexity of building, managing, and integrating streaming applications with other AWS services. You can quickly build SQL queries and sophisticated Java applications using built-in templates and operators for common processing functions to organize, transform, aggregate, and analyze data at any scale.
 
Amazon Kinesis Data Analytics takes care of everything required to run your real-time applications continuously and scales automatically to match the volume and throughput of your incoming data. With Amazon Kinesis Data Analytics, you only pay for the resources your streaming applications consume. There is no minimum fee or setup cost.
 
Q: What is real-time stream processing and why do I need it?
Data is coming at us at lightning speeds due to an explosive growth of real-time data sources. Whether it is log data from mobile and web applications, purchase data from ecommerce sites, or sensor data from IoT devices, the data delivers information that can help companies learn about what their customers, organization, and business are doing right now. By having visibility into this data as it arrives, you can monitor your business in real time and quickly leverage new business opportunities. For example, making promotional offers to customers based on where they might be at a specific time, or monitoring social sentiment and changing customer attitudes to identify and act on new opportunities.
 
To take advantage of these opportunities, you need a different set of analytics tools for collecting and analyzing real-time streaming data than what has been available traditionally for static, stored data. With traditional analytics, you gather the information, store it in a database, and analyze it hours, days, or weeks later. Analyzing real-time data requires a different approach, different tools, and different services. Instead of running database queries on stored data, streaming analytics services process the data continuously before the data is stored. Streaming data flows at an incredible rate that can vary up and down all the time. Streaming analytics services need to process this data when it arrives, often at speeds of millions of events per hour.
 
Q: What can I do with Kinesis Data Analytics?
You can use Kinesis Data Analytics for many use cases to process data continuously and get insights in seconds or minutes rather than waiting days or even weeks. Kinesis Data Analytics enables you to quickly build end-to-end stream processing applications for log analytics, clickstream analytics, Internet of Things (IoT), ad tech, gaming, and more. The three most common use cases are streaming extract-transform-load (ETL), continuous metric generation, and responsive analytics.
 
Streaming ETL
Streaming ETL applications enable you to clean, enrich, organize, and transform raw data prior to loading your data lake or data warehouse in real-time, reducing or eliminating batch ETL steps. These applications can buffer small records into larger files prior to delivery, and perform sophisticated joins across streams and tables. For example, you can build an application that continuously reads IoT sensor data stored in Amazon Kinesis Data Streams, organize the data by sensor type, remove duplicate data, normalizes data per a specified schema, and then deliver the data to Amazon S3.
 
Continuous metric generation
Continuous metric generation applications enable you to monitor and understand how your data is trending over time. Your applications can aggregate streaming data into critical information and seamlessly integrate it with reporting databases and monitoring services to serve your applications and users in real-time. With Kinesis Data Analytics, you can use SQL or Java code to continuously generate time-series analytics over time windows. For example, you can build a live leaderboard for a mobile game by computing the top players every minute and then sending it to Amazon DynamoDB. Or, you can track the traffic to your website by calculating the number of unique website visitors every five minutes and then sending the processed results to Amazon Redshift.
 
Responsive real-time analytics
Responsive real-time analytics applications send real-time alarms or notifications when certain metrics reach predefined thresholds, or in more advanced cases, when your application detects anomalies using machine learning algorithms. These applications enable you to respond immediately to changes in your business in real-time like predicting user abandonment in mobile apps and identifying degraded systems. For example, an application can compute the availability or success rate of a customer-facing API over time, and then send results to Amazon CloudWatch. You can build another application to look for events that meet certain criteria, and then automatically notify the right customers using Amazon Kinesis Data Streams and Amazon Simple Notification Service (SNS).
 
Q: How do I get started with Java applications for Kinesis Data Analytics?
Sign into the Amazon Kinesis Data Analytics console and create a new stream processing application. You can also use the AWS CLI and AWS SDKs. Once you create an application, go your favorite Integrated Development Environment, connect to AWS, and install the open source Java libraries. The open source libraries are based on Apache Flink, an open source framework and engine for processing data streams, and AWS SDKs. The extensible libraries include more than 25 pre-built stream processing operators like window and aggregate, and AWS service integrations like Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose. Once built, you upload your code to Amazon Kinesis Data Analytics and the service takes care of everything required to run your real-time applications continuously including scaling automatically to match the volume and throughput of your incoming data.
 
Q: How do I get started with SQL applications for Kinesis Data Analytics?
Sign into the Amazon Kinesis Data Analytics console and create a new stream processing application. You can also use the AWS CLI and AWS SDKs. You can build an end-to-end application in three simple steps: 1) configure incoming streaming data, 2) write your SQL queries, and 3) point to where you want the results loaded. Kinesis Data Analytics recognizes standard data formats such as JSON, CSV, and TSV, and automatically creates a baseline schema. You can refine this schema, or if your data is unstructured, you can define a new one using our intuitive schema editor. Then, the service applies the schema to the input stream and makes it look like a SQL table that is continually updated so that you can write standard SQL queries against it. You use our SQL editor to build your queries.
 
The SQL editor comes with all the bells and whistles including syntax checking and testing against live data. We also give you templates that provide the SQL code for anything from a simple stream filter to advanced anomaly detection and top-K analysis. Kinesis Data Analytics takes care of provisioning and elastically scaling all of the infrastructure to handle any data throughput. You don’t need to plan, provision, or manage infrastructure.
 
Q: What are the limits of Kinesis Data Analytics?
Kinesis Data Analytics elastically scales your application to accommodate for the data throughput of your source stream and your query complexity for most scenarios. For detailed information on service limits, see Limits in the Amazon Kinesis Data Analytics for SQL Developer Guide. For detailed information on service Limits for Java applications, visit the Limits section in the Amazon Kinesis Data Analytics for Java Developer Guide.

Key Concepts

Q: What is a Kinesis Data Analytics application?
An application is the Kinesis Data Analytics entity that you work with. Kinesis Data Analytics applications continuously read and process streaming data in real time. You write application code using SQL or Java to process the incoming streaming data and produce output. Then, Kinesis Data Analytics writes the output to a configured destination.
 
Each application consists of three primary components:
 
Input – The streaming source for your application. In the input configuration, you map the streaming source to an in-application data stream(s). Data flows from in your data source(s) into your in-application data streams. You process data from these in-application data streams using your application code, sending processed data to subsequent in-application data streams or destinations. You add inputs inside application code for Java applications and via the API for SQL applications.
 
Application code – A series of Java operators or SQL statements that process input and produce output. In its simplest form, application code can be a single Java operator or SQL statement that reads from an in-application data stream associated with a streaming source and writes to an in-application data stream associated with an output. You can write Java or SQL code that splits the initial in-application data stream into multiple streams and applies additional logic to these separate streams.
 
Output – You can create one or more in-application streams to store intermediate results. You can then optionally configure an application output to persist data from specific in-application streams to an external destination. You add these outputs inside application code for Java applications or API for SQL applications.
 
Q: What is an in-application data stream?
An in-application data stream is an entity that continuously stores data in your application for you to perform processing. Your applications continuously writes to and reads from in-application data streams. For Java applications, you interact with in-application stream by processing data via stream operators. Operators transform one or more data streams into a new data stream. For SQL applications, you interact with an in-application stream in the same way you would a SQL table by using SQL statements. You apply SQL statements to one or more data streams and insert the results into a new data stream.
 
Q: What application code is supported?
For Java applications, Kinesis Data Analytics supports Java applications built using Apache Flink and the AWS SDKs. For SQL applications, Kinesis Data Analytics supports the ANSI SQL with some extensions to the SQL standard to make it easier to work with streaming data.

Managing Applications

Q: How can I monitor the operations and performance of my Kinesis Data Analytics applications?
AWS provides various tools that you can use to monitor your Kinesis Data Analytics applications. You can configure some of these tools to do the monitoring for you. For more information about how to monitor your application, see:
 
Q: How do I manage and control access to my Kinesis Data Analytics applications?
Kinesis Data Analytics needs permissions to read records from the streaming data sources that you specify in your application. Kinesis Data Analytics also needs permissions to write your application output to destinations that you specify in your application output configuration. You can grant these permissions by creating IAM roles that Kinesis Data Analytics can assume. The permissions you grant to this role determine what Kinesis Data Analytics can do when the service assumes the role. For more information, see:
 
Q: How does Kinesis Data Analytics scale my application?
Kinesis Data Analytics elastically scales your application to accommodate the data throughput of your source stream and your query complexity for most scenarios. Kinesis Data Analytics provisions capacity in the form of Amazon Kinesis Processing Units (KPU). One KPU provides you with 1 vCPU and 4GB memory.
 
For Java applications, Kinesis Data Analytics assigns 50GB of running application storage per KPU that your application uses for checkpoints and is available for you to use via temporary disk. A checkpoint is an up-to-date backup of a running application that is used to recover immediately from an application disruption. You can also control the parallel execution for your Kinesis Data Analytics for Java application tasks (such as reading from a source or executing an operator) using the Parallelism and ParallelismPerKPU parameters in the API. Parallelism defines the number of concurrent instances of a task. All operators, sources, and sinks execute with a defined parallelism, by default 1. Parallelism per KPU defines the amount of the number of parallel tasks that can be scheduled per Kinesis Processing Unit (KPU) of your application, by default 1. For more information, see Scaling in the Amazon Kinesis Data Analytics for Java Developer Guide.
 
For SQL applications, each streaming source is mapped to a corresponding in-application stream. While this is not required for many customers, you can more efficiently use KPUs by increasing the number of in-application streams that your source is mapped to by specifying the input parallelism parameter. Kinesis Data Analytics evenly assigns the streaming data source’s partitions, such as Amazon Kinesis data stream’s shards, to the number of in-application data streams that you specified. For example, if you have a 10-shard Amazon Kinesis data stream as a streaming data source and you specify an input parallelism of two, Kinesis Data Analytics assigns five Amazon Kinesis shards to two in-application streams named “SOURCE_SQL_STREAM_001” and “SOURCE_SQL_STREAM_002”. For more information, see Configuring Application Input in the Amazon Kinesis Data Analytics for SQL Developer Guide.
 
Q: What are the best practices associated for building and managing my Kinesis Data Analytics applications?
For information about best practices for SQL, see the Best Practices section of the Amazon Kinesis Data Analytics for SQL Developer Guide, which covers managing applications, defining input schema, connecting to outputs, and authoring application code.

Pricing and Billing

Q: How much does Kinesis Data Analytics cost?
With Amazon Kinesis Data Analytics, you pay only for what you use. There are no resources to provision or upfront costs associated with Amazon Kinesis Data Analytics.
 
You are charged an hourly rate based on the number of Amazon Kinesis Processing Units (or KPUs) used to run your streaming application. A single KPU is a unit of stream processing capacity comprised of 1 vCPU compute and 4 GB memory. Amazon Kinesis Data Analytics automatically scales the number of KPUs required by your stream processing application as the demands of memory and compute vary in response to processing complexity and the throughput of streaming data processed.
 
For Java applications, you are charged a single additional KPU per application, used for application orchestration. Java applications are also charged for running application storage and durable application backups. Running application storage is used for Amazon Kinesis Data Analytics’ stateful processing capabilities and is charged per GB-month. Durable application backups are optional and provide a point-in-time recovery point for applications, charged per GB-month.
 
For more information about pricing, see the Amazon Kinesis Data Analytics pricing page.
 
Q: Is Kinesis Data Analytics available in AWS Free Tier?
No. Kinesis Data Analytics is not currently available in AWS Free Tier. AWS Free Tier is a program that offers free trial for a group of AWS services.
 
Q: Am I charged for a Kinesis Data Analytics application that is running but not processing any data from the source?
For SQL applications, you are charged a minimum of one KPU if your Kinesis Data Analytics application is running. For Java applications, you are charged a minimum of two KPUs and 50 GB running application storage if your Kinesis Data Analytics application is running.
 
Q: Other than Kinesis Data Analytics costs, are there any other costs that I might incur?
Kinesis Data Analytics is a fully managed stream processing solution, independent from the streaming source that it reads data from and the destinations it writes processed data to. You will be billed independently for the services you read from and write to in your application.

Building Java Applications

Authoring Application Code for Java Applications

Q: What is Apache Flink?
Apache Flink is an open source framework and engine for stream and batch data processing. It makes streaming applications easy to build, because it provides powerful operators and solves the core streaming problems like duplicate processing very well. Apache Flink provides data distribution, communication, and fault tolerance for distributed computations over data streams.
 
Q: How do I develop applications?
You can start by downloading the open source libraries that include the AWS SDK, Apache Flink, and connectors for AWS services. You can get instructions on how to download the libraries and create your first application in the Amazon Kinesis Data Analytics for Java Developer Guide.
 
Q: What does my application code look like?
You write your Java code using data streams and stream operators. Application data streams are the data structure you perform processing against using your Java code. Data continuously flows from the sources into application data streams. One or more stream operators are used to define your processing on the application data streams, including transform, partition, aggregate, join and window. Data streams and operators can be put together in serial and parallel chains. A short example using pseudo code is shown below.
DataStream <GameEvent> rawEvents = env.addSource(
	New KinesisStreamSource(“input_events”));
DataStream <UserPerLevel> gameStream =
	rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId, 
			event.gameMetadata.levelId,event.userId));
gameStream.keyBy(event -> event.gameId)
            .keyBy(1)
            .window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
            .apply(...) - > {...};
gameStream.addSink(new KinesisStreamSink("myGameStateStream"));
Q: How do I use the operators?
Operators take an application data stream as input and send processed data to an application data stream as output. Operators can be put together to build applications with multiple steps and don’t require advanced knowledge of distributed systems to implement and operate.
 
Q: What operators are supported?
Kinesis Data Analytics for Java includes over 25 operators from Apache Flink that can be used to solve a wide variety of use cases including Map, KeyBy, aggregations, Window Join, and Window. Map allows you to perform arbitrary processing, taking one element from an incoming data stream and producing another element. KeyBy logically organizes data using a specified key enabling you to process similar data points together. Aggregations performs processing across multiple keys like sum, min, and max. Window Join joins two data streams together on a given key and window. Window group date using a key and typically time-based operation, like counting the number of unique items over a 5-minute time-period.
 
You can build custom operators if these do not meet your needs. You can find more examples in the Operators section of the Amazon Kinesis Data Analytics for Java Developer Guide. You can find a full list of Apache Flink operators in the Operators section of the Apache Flink documentation.
 
Q: What integrations are supported in a Kinesis Data Analytics Java application?
You can setup integrations with minimal code. The open source libraries based on Apache Flink support streaming sources and destinations, or sinks, for the delivery of process data. This also includes support for data enrichment via asynchronous input/output connectors. A list of AWS specific connectors included in the open source libraries are shown below.
  • Streaming data sources: Amazon Kinesis Data Streams
  • Destinations, or sinks: Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon DynamoDB, and Amazon S3 (through file sink integrations).
 
Apache Flink also includes other connectors including Apache Kafka, Apache Casssandra, Elasticsearch and more.
 
Q: Are custom integrations supported?
You can add a source or destination to your application by building upon a set of primitives that enable you to read and write from files, directories, sockets, or anything that you can access over the internet. Apache Flink provides these primitives for data sources and data sinks. The primitives come with configurations like the ability to read and write data continuously or once, asynchronously or synchronously, and much more. For example, you can setup an application to read continuously from Amazon S3 by extending the existing file based source integration.
 
Q: What delivery model does Kinesis Data Analytics for Java applications provide?
Java applications in Kinesis Data Analytics use an “exactly once” delivery model if an application is built using idempotent operators, including sources and sinks. This means the processed data will impact downstream results once and only once. Checkpoints save the current application state and enable Kinesis Data Analytics for Java applications to recover the position of the application to provide the same semantics as a failure-free execution. Checkpoints for Java applications are provided via Apache Flink’s checkpointing functionality. By default, Kinesis Data Analytics for Java applications uses Apache Flink’s exactly-once semantics. Your application will support exactly once processing semantics if you design your applications using sources, operators, and sinks that utilize Apache Flink’s exactly once semantics.
 
Q: Do I have access to local storage from my application storage?
Yes. Kinesis Data Analytics for Java applications provides your application 50 GB of running application storage per Kinesis Processing Unit (KPU). Kinesis Data Analytics scales storage with your application. Running application storage is used for saving application state using checkpoints. It is also accessible to your application code to use as temporary disk for caching data or any other purpose. Kinesis Data Analytics can remove data from running application storage not saved via checkpoints (e.g operators, sources, sinks) at any time. All data stored in running application storage is encrypted at rest.
 
Q: How does Kinesis Data Analytics for Java automatically backup my application?
Kinesis Data Analytics automatically backs up your running application’s state using checkpoints and snapshots. Checkpoints save the current application state and enable Kinesis Data Analytics for Java applications to recover the position of the application to provide the same semantics as a failure-free execution. Checkpoints utilize running application storage. Snapshots save a point in time recovery point for applications. Snapshots utilize durable application backups.
 
Q: What are application snapshots?
Snapshots enable you to create and restore your application to a previous point in time. This enables you to maintain previous application state and rollback your application at any time. You control how snapshots you have at any given from zero to thousands of snapshots. Snapshots use durable application backups and Kinesis Data Analytics charges you based on their size. Kinesis Data Analytics encrypts data saved in snapshots by default. You can delete individual snapshots through the API or all snapshots by deleting your application.
 
Q: What versions of Apache Flink are supported?
Amazon Kinesis Data Analytics for Java applications supports Apache Flink 1.6 and Java version 8.

Building SQL Applications

Configuring Input for SQL Applications

Q: What inputs are supported in a Kinesis Data Analytics SQL application?
SQL applications in Kinesis Data Analytics support two types of inputs: streaming data sources and reference data sources. A streaming data source is continuously generated data that is read into your application for processing. A reference data source is static data that your application uses to enrich data coming in from streaming sources. Each application can have no more than one streaming data source and no more than one reference data source. An application continuously reads and processes new data from streaming data sources, including Amazon Kinesis Data Streams or Amazon Kinesis Data Firehose. An application reads a reference data source, including Amazon S3, in its entirety for use in enriching the streaming data source through SQL JOINs.
 
Q: What is a reference data source?
A reference data source is static data that your application uses to enrich data coming in from streaming sources. You store reference data as an object in your S3 bucket. When the SQL application starts, Kinesis Data Analytics reads the S3 object and creates an in-application SQL table to store the reference data. Your application code can then join it with an in-application stream. You can update the data in the SQL table by calling the UpdateApplication API.
 
Q: How do I set up a streaming data source in my SQL application?
A streaming data source can be an Amazon Kinesis data stream or an Amazon Kinesis Data Firehose delivery stream. Your Kinesis Data Analytics SQL application continuously reads new data from streaming data sources as it arrives in real time. The data is made accessible in your SQL code through an in-application stream. An in-application stream acts like a SQL table because you can create, insert, and select from it. However, the difference is that an in-application stream is continuously updated with new data from the streaming data source.
 
You can use the AWS Management Console to add a streaming data source. You can learn more about sources in the Configuring Application Input section of the Kinesis Data Analytics for SQL Developer Guide.
 
Q: How do I set up a reference data source in my SQL application?
A reference data source can be an Amazon S3 object. Your Kinesis Data Analytics SQL application reads the S3 object in its entirety when it starts running. The data is made accessible in your SQL code through a table. The most common use case for using a reference data source is to enrich the data coming from the streaming data source using a SQL JOIN.
 
Using the AWS CLI, you can add a reference data source by specifying the S3 bucket, object, IAM role, and associated schema. Kinesis Data Analytics loads this data when you start the application, and reloads it each time you make any update API call.
 
Q: What data formats are supported for SQL applications?
SQL applications in Kinesis Data Analytics can detect the schema and automatically parses UTF-8 encoded JSON and CSV records using the DiscoverInputSchema API. This schema is applied to the data read from the stream as part of the insertion into an in-application stream.
 
For other UTF-8 encoded data that does not use a delimiter, uses a different delimiter than CSV, or in cases were the discovery API did not fully discover the schema, you can define a schema using the interactive schema editor or use string manipulation functions to structure your data. For more information, see Using the Schema Discovery Feature and Related Editing in the Amazon Kinesis Data Analytics for SQL Developer Guide.
 
Q: How is my input stream exposed to my SQL code?
Kinesis Data Analytics for SQL applies your specified schema and inserts your data into one or more in-application streams for streaming sources, and a single SQL table for reference sources. The default number of in-application streams is the one that meets the needs of most of your use cases. You should increase this if you find that your application is not keeping up with the latest data in your source stream as defined by CloudWatch metric MillisBehindLatest. The number of in-application streams required is impacted by both the amount of throughput in your source stream and your query complexity. The parameter for specifying the number of in-application streams that are mapped to your source stream is called input parallelism.

Authoring Application Code for SQL Applications

Q: What does my SQL application code look like?
Application code is a series of SQL statements that process input and produce output. These SQL statements operate on in-application streams and reference tables. An in-application stream is like a continuously updating table on which you can perform the SELECT and INSERT SQL operations. Your configured sources and destinations are exposed to your SQL code through in-application streams. You can also create additional in-application streams to store intermediate query results.
 
You can use the following pattern to work with in-application streams:
  • Always use a SELECT statement in the context of an INSERT statement. When you select rows, you insert results into another in-application stream.
  • Use an INSERT statement in the context of a pump. You use a pump to make an INSERT statement continuous, and write to an in-application stream.
  • You use a pump to tie in-application streams together, selecting from one in-application stream and inserting into another in-application stream.
 
The following SQL code provides a simple, working application:
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    change DOUBLE,
    price DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM"    
    SELECT STREAM ticker_symbol, change, price    
    FROM "SOURCE_SQL_STREAM_001";
For more information about application code, see Application Code in the Amazon Kinesis Data Analytics for SQL Developer Guide.
 
Q: How does Kinesis Data Analytics help me with writing SQL code?
Kinesis Data Analytics includes a library of analytics templates for common use cases including streaming filters, tumbling time windows, and anomaly detection. You can access these templates from the SQL editor in the AWS Management Console. After you create an application and navigate to the SQL editor, the templates are available in the upper-left corner of the console.
 
Q: How can I perform real-time anomaly detection in Kinesis Data Analytics?
Kinesis Data Analytics includes pre-built SQL functions for several advanced analytics including one for anomaly detection. You can simply make a call to this function from your SQL code for detecting anomalies in real-time. Kinesis Data Analytics uses the Random Cut Forest algorithm to implement anomaly detection. For more information on Random Cut Forests, see the Streaming Data Anomaly Detection whitepaper.

Configuring Destinations in SQL Applications

Q: What destinations are supported?
Kinesis Data Analytics for SQL supports up to four destinations per application. You can persist SQL results to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service (through Amazon Kinesis Data Firehose), and Amazon Kinesis Data Streams. You can write to a destination not directly supported by Kinesis Data Analytics by sending SQL results to Amazon Kinesis Data Streams, and leveraging its integration with AWS Lambda to send to a destination of your choice.
 
Q: How do I set up a destination?
In your application code, you write the output of SQL statements to one or more in-application streams. Optionally, you can add an output configuration to your application to persist everything written to specific in-application streams to up to four external destinations. These external destinations can be an Amazon S3 bucket, Amazon Redshift table, Amazon Elasticsearch Service domain (through Amazon Kinesis Data Firehose) and an Amazon Kinesis data stream. Each application supports up to four destinations, which can be any combination of the above. For more information, see Configuring Output Streams in the Amazon Kinesis Data Analytics for SQL Developer Guide.
 
Q: My preferred destination is not directly supported. How can I send SQL results to this destination?
You can use AWS Lambda to write to a destination that is not directly supported using. We recommend that you write results to an Amazon Kinesis data stream, and then use AWS Lambda to read the processed results and send it to the destination of your choice. For more information, see the Example: AWS Lambda Integration in the Amazon Kinesis Data Analytics for SQL Developer Guide. Alternatively, you can use a Kinesis Data Firehose delivery stream to load the data into Amazon S3, and then trigger an AWS Lambda function to read that data and send it to the destination of your choice. For more information, see Using AWS Lambda with Amazon S3 in the AWS Lambda Developer Guide.
 
Q: What delivery model does Kinesis Data Analytics provide?
SQL applications in Kinesis Data Analytics uses an "at least once" delivery model for application output to the configured destinations. Kinesis Data Analytics applications take internal checkpoints, which are points in time when output records were delivered to the destinations and there was no data loss. The service uses the checkpoints as needed to ensure that your application output is delivered at least once to the configured destinations. For more information about the delivery model, see Configuring Application Output in the Amazon Kinesis Data Analytics for SQL Developer Guide.

Comparison to Other Stream Processing Solutions

Q: How does Amazon Kinesis Data Analytics differ from running my own application using the Amazon Kinesis Client Library?
The Amazon Kinesis Client Library (KCL) is a pre-built library that helps you build consumer applications for reading and processing data from an Amazon Kinesis data stream. The KCL handles complex issues such as adapting to changes in data stream volume, load balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. The KCL enables you to focus on business logic while building applications.
 
With Kinesis Data Analytics, you can process and query real-time, streaming data. You use standard SQL to process your data streams, so you don’t have to learn any new programming languages. You just point Kinesis Data Analytics to an incoming data stream, write your SQL queries, and then specify where you want the results loaded. Kinesis Data Analytics uses the KCL to read data from streaming data sources as one part of your underlying application. The service abstracts this from you, as well as many of the more complex concepts associated with using the KCL, such as checkpointing.
 
If you want a fully managed solution and you want to use SQL to process the data from your data stream, you should use Kinesis Data Analytics. Use the KCL if you need to build a custom processing solution whose requirements are not met by Kinesis Data Analytics, and you are able to manage the resulting consumer application.

Get started with Amazon Kinesis Data Analytics

Product-Page_Standard-Icons_01_Product-Features_SqInk
Calculate your costs

Visit the pricing page

Learn more 
Product-Page_Standard-Icons_01_Product-Features_SqInk
Review the getting-started guide

Learn how to use Amazon Kinesis Data Analytics in this step-by-step guide.

Product-Page_Standard-Icons_03_Start-Building_SqInk
Start building streaming applications

Build your first streaming application from the Amazon Kinesis Data Analytics console.