Q: What is Amazon Kinesis?

Amazon Kinesis is a fully managed streaming data service. You can continuously put various types of data such as clickstreams, application logs, and social media into an Amazon Kinesis stream from hundreds of thousands of sources. Within seconds, the data will be available for your Amazon Kinesis Applications to read and process from the stream.

Q: What does Amazon Kinesis manage on my behalf?

Amazon Kinesis manages the infrastructure, storage, networking, and configuration needed to stream your data at the level of your data throughput. You do not have to worry about provisioning, deployment, ongoing-maintenance of hardware, software, or other services for your data streams. In addition, Amazon Kinesis synchronously replicates data across three facilities in an AWS Region, providing high availability and data durability.

Q: What can I do with Amazon Kinesis?

Amazon Kinesis is useful for rapidly moving data off data producers and then continuously processing the data, be it to transform the data before emitting into a data store, run real-time metrics and analytics, or derive more complex data streams for further processing. The following are typical scenarios for using Amazon Kinesis:

  • Accelerated log and data feed intake: Instead of waiting to batch up the data, you can have your data producers push data into an Amazon Kinesis stream as soon as the data is produced, preventing data loss in case of data producer failures. For example, system and application logs can be continuously put into an Amazon Kinesis stream and be available for processing within seconds. 
  • Real-time metrics and reporting: You can extract metrics and generate reports from Amazon Kinesis stream data in real-time. For example, your Amazon Kinesis Application can work on metrics and reporting for system and application logs as the data is streaming in, rather than wait to receive data batches.
  • Real-time data analytics: With Amazon Kinesis, you can run real-time streaming data analytics. For example, you can put clickstreams into your Amazon Kinesis stream and have your Amazon Kinesis Application run analytics in real-time, enabling you to gain insights out of your data at a scale of minutes instead of hours or days. 
  • Complex stream processing: You can create Directed Acyclic Graphs (DAGs) of Amazon Kinesis Applications and data streams. In this scenario, one or more Amazon Kinesis Applications can put data into another Amazon Kinesis stream for further processing, enabling successive stages of stream processing.

Q: How do I use Amazon Kinesis?

After you sign up for Amazon Web Services, you can start using Amazon Kinesis by:

Q: What are the limits of Amazon Kinesis?

The throughput of an Amazon Kinesis stream is designed to scale without limits via increasing the number of shards within a stream. However, there are certain limits you should keep in mind while using Amazon Kinesis:

  • Data records of an Amazon Kinesis stream are accessible for up to 24 hours from the time they are added to the stream.
  • The maximum size of a data blob (the data payload before Base64-encoding) within one put data transaction is 50 kilobytes (KB). 
  • Each shard can support up to 1000 put data transactions per second and 5 read data transactions per second.

For more information about other API level limits, see Amazon Kinesis Limits.

Q: How does Amazon Kinesis differ from Amazon SQS?

Amazon Kinesis enables real-time processing of streaming big data. It provides ordering of data records, as well as the ability to read and/or replay data records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all data records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).

Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.

Q: When should I use Amazon Kinesis, and when should I use Amazon SQS?

We recommend Amazon Kinesis for use cases with requirements that are similar to the following:

  • Routing related data records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
  • Ordering of data records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
  • Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
  • Ability to consume data records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis stores data for up to 24 hours, you can run the audit application up to 24 hours behind the billing application.

We recommend Amazon SQS for use cases with requirements that are similar to the following:

  • Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
  • Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
  • Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
  • Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.


Q: What is a shard?

Shard is the base throughput unit of an Amazon Kinesis stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 put data and 5 read data transactions per second. You will specify the number of shards needed when you create an Amazon Kinesis stream. For example, you can create an Amazon Kinesis stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 put data and 10 read data transactions per second. You can dynamically add or remove shards from your Amazon Kinesis stream as your data throughput changes via Amazon Kinesis Resharding.

Q: What is a data record?

A data record is the unit of data stored in an Amazon Kinesis stream. A data record is composed of a sequence number, partition key, and data blob. Data blob is the data of interest your data producer puts into an Amazon Kinesis stream. The maximum size of a data blob (the data payload before Base64-encoding) is 50 kilobytes (KB).

Q: What is a partition key?

Partition key is used to segregate and route data records to different shards of a stream. A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are put into Shard 1 and all data records with Key B are put into Shard 2. For more information about partition keys, see Partition Keys.

Q: What is a sequence number?

A sequence number is a unique identifier for each data record. Sequence number is assigned by Amazon Kinesis when a data producer calls PutRecord API to add data to an Amazon Kinesis stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord requests, the larger the sequence numbers become. For more information about sequence number, see Sequence Numbers.


Q: How do I create an Amazon Kinesis stream?

After you sign up for Amazon Web Services, you can create an Amazon Kinesis stream through either Amazon Kinesis Management Console or Amazon Kinesis CreateStream API.

Q: How do I decide the throughput of my Amazon Kinesis stream?

The throughput of an Amazon Kinesis stream is determined by the number of shards within the stream. Follow the steps below to estimate the initial number of shards your Amazon Kinesis stream needs. Note that you can dynamically adjust the number of shards within your Amazon Kinesis stream via Amazon Kinesis Resharding after the stream is created.

  1. Estimate the average size of the data record written to the Amazon Kinesis stream in kilobytes (KB), rounded up to the nearest 1 KB. (average_data_size_in_KB)
  2. Estimate the number of data records written to and read from the Amazon Kinesis stream per second.(number_of_transactions_per_second)
  3. Decide the number of Amazon Kinesis Applications consuming data concurrently and independently from the Amazon Kinesis stream. (number_of_consumers)
  4. Calculate the incoming write bandwidth in KB (incoming_write_bandwidth_in_KB), which is equal to the average_data_size_in_KB multiplied by the number_of_transactions_per_seconds.
  5. Calculate the outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB), which is equal to the incoming_write_bandwidth_in_KB multiplied by the number_of_consumers.

You can then calculate the initial number of shards (number_of_shards) your Amazon Kinesis stream needs using the following formula:

number_of_shards = max (incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB/2000)

Q: What is the minimum throughput I can request for my Amazon Kinesis stream?

The throughput of an Amazon Kinesis stream scales by unit of shard. One single shard is the smallest throughput of an Amazon Kinesis stream, which provides 1MB/sec data input and 2MB/sec data output.

Q: What is the maximum throughput I can request for my Amazon Kinesis stream?

The throughput of an Amazon Kinesis stream is designed to scale without limits. By default, each account can provision 10 shards per region. You can use the Amazon Kinesis Limits form to request more than 10 shards within a single region.

Q: How can data record size affect the throughput of my Amazon Kinesis stream?

A shard provides 1MB/sec data input rate and supports up to 1000 put data transactions per sec. Therefore, if the data record size is less than 1KB, the actual data input rate of a shard will be less than 1MB/sec, limited by the maximum number of put data transactions per second.


Q: How do I put data into my Amazon Kinesis stream?

Amazon Kinesis PutRecord API is used for adding data to an Amazon Kinesis stream. After you create an Amazon Kinesis stream, you need to configure your data producers to continuously call PutRecord API. Within each PutRecord call, you need to specify the name of your Amazon Kinesis stream and a partition key. For more information about PutRecord API, see PutRecord.

Q: What programming languages or platforms can I use to access Amazon Kinesis API?

Amazon Kinesis API is available in Amazon Web Services SDKs. For a list of programming languages or platforms for Amazon Web Services SDKs, see Tools for Amazon Web Services.

Q: What happens if the capacity limits of an Amazon Kinesis stream are exceeded while the data producer puts data into the stream?

The capacity limits of an Amazon Kinesis stream are defined by the number of shards within the stream. The limits can be exceeded by either data throughput or the number of put data transactions. While the capacity limits are exceeded, the put data transaction will be rejected with a ProvisionedThroughputExceeded exception. If this is due to a temporary rise of the stream’s input data rate, retry by the data producer will eventually lead to completion of the requests. If this is due to a sustained rise of the stream’s input data rate, you should increase the number of shards within your stream to provide enough capacity for the put data transactions to consistently succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the change of the stream’s input data rate and the occurrence of ProvisionedThroughputExceeded exceptions.

Q: What data is counted against the data throughput of an Amazon Kinesis stream during a PutRecord call?

Your data blob, partition key, and stream name are required parameters of a PutRecord call. The size of your data blob (before Base64 encoding) and partition key will be counted against the data throughput of your Amazon Kinesis stream, which is determined by the number of shards within the stream.


Q: What is an Amazon Kinesis Application?

An Amazon Kinesis Application is a data consumer that reads and processes data from an Amazon Kinesis stream. You can build your Amazon Kinesis Applications using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL).

Q: What is Amazon Kinesis Client Library (KCL)?

Amazon Kinesis Client Library (KCL) for Java | Python is a pre-built library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. Amazon Kinesis Client Library (KCL) handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. Amazon Kinesis Client Library (KCL) enables you to focus on business logic while building Amazon Kinesis Applications.

Q: What is Amazon Kinesis Connector Library?

Amazon Kinesis Connector Library is a pre-built library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools. Amazon Kinesis Client Library (KCL) for Java | Python is required for using Amazon Kinesis Connector Library. The current version of this library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.

Q: What is Amazon Kinesis Storm Spout?

Amazon Kinesis Storm Spout is a pre-built library that helps you easily integrate Amazon Kinesis with Apache Storm. The current version of Amazon Kinesis Storm Spout fetches data from Amazon Kinesis stream and emits it as tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis as a reliable, scalable, stream capture, storage, and replay service.

Q: What programming language are Amazon Kinesis Client Library (KCL), Amazon Kinesis Connector Library, and Amazon Kinesis Storm Spout available in?

Amazon Kinesis Client Library (KCL) is currently available in Java and PythonAmazon Kinesis Connector Library and Amazon Kinesis Storm Spout are currently available in Java. We are looking to add support for other programming languages.

Q: Do I have to use Amazon Kinesis Client Library (KCL) for my Amazon Kinesis Application?

No, you can also use Amazon Kinesis API to build your Amazon Kinesis Application. However, we recommend using Amazon Kinesis Client Library (KCL) for Java | Python if applicable because it performs heavy-lifting tasks associated with distributed stream processing, making it more productive to develop Amazon Kinesis Applications.

Q: How does Amazon Kinesis Client Library (KCL) interact with an Amazon Kinesis Application?

Amazon Kinesis Client Library (KCL) for Java | Python acts as an intermediary between Amazon Kinesis and your Amazon Kinesis Application. Amazon Kinesis Client Library (KCL) uses the IRecordProcessor interface to communicate with your application. Your application implements this interface, and Amazon Kinesis Client Library (KCL) calls into your application code using the methods in this interface.

For more information about building Amazon Kinesis Application with Amazon Kinesis Client Library (KCL), see Developing Consumer Applications for Amazon Kinesis Using the Amazon Kinesis Client Library.

Q: What is a worker and a record processor generated by Amazon Kinesis Client Library (KCL)?

An Amazon Kinesis Application can have multiple application instances and a worker is the processing unit that maps to each application instance. A record processor is the processing unit that processes data from a shard of an Amazon Kinesis stream. One worker maps to one or more record processors. One record processor maps to one shard and processes data records from that shard.

At startup, an Amazon Kinesis Application calls into Amazon Kinesis Client Library (KCL) for Java | Python to instantiate a worker. This call provides Amazon Kinesis Client Library (KCL) with configuration information for the application, such as the stream name and AWS credentials. This call also passes a reference to an IRecordProcessorFactory implementation. Amazon Kinesis Client Library (KCL) uses this factory to create new record processors as needed to process data from the stream. Amazon Kinesis Client Library (KCL) communicates with these record processors using the IRecordProcessor interface.

Q: How does Amazon Kinesis Client Library (KCL) keep tracking data records being processed by an Amazon Kinesis Application?

Amazon Kinesis Client Library (KCL) for Java | Python automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints. The DynamoDB table shares the same name with the Amazon Kinesis Application so that you need to make sure your application name doesn’t conflict with any existing DynamoDB tables under the same account within the same region.

All workers associated with the same application name are assumed to be working together on the same Amazon Kinesis stream. If you run an additional instance of the same application code, but with a different application name, Amazon Kinesis Client Library (KCL) treats the second instance as an entirely separate application also operating on the same Amazon Kinesis stream.

Please note that your account will be charged for the costs associated with the Amazon DynamoDB table in addition to the costs associated with Amazon Kinesis.

For more information about how Amazon Kinesis Client Library (KCL) tracks Amazon Kinesis Application state, see Tracking Amazon Kinesis Application state.

Q: How can I automatically scale up the processing capacity of my Amazon Kinesis Application using Amazon Kinesis Client Library (KCL)?

You can create multiple instances of your Amazon Kinesis Application and have these application instances run across a set of Amazon EC2 instances that are part of an Auto Scaling group. While the processing demand increases, an Amazon EC2 instance running your application instance will be automatically instantiated. Amazon Kinesis Client Library (KCL) for Java | Python will generate a worker for this new instance and automatically move record processors from overloaded existing instances to this new instance.

Q: Why does GetRecords API return empty result while there is data within my Amazon Kinesis stream?

One possible reason is that there is no data record at the position specified by the current shard iterator. This could happen even if you are using TRIM_HORIZON as shard iterator type. An Amazon Kinesis stream represents a continuous stream of data. You should call GetRecords API in a loop and the data record will be returned when the shard iterator advances to the position where the data record is stored.

Q: What happens if the capacity limits of an Amazon Kinesis stream are exceeded while Amazon Kinesis Application reads data from the stream?

The capacity limits of an Amazon Kinesis stream are defined by the number of shards within the stream. The limits can be exceeded by either data throughput or the number of read data transactions. While the capacity limits are exceeded, the read data transaction will be rejected with a ProvisionedThroughputExceeded exception. If this is due to a temporary rise of the stream’s output data rate, retry by the Amazon Kinesis Application will eventually lead to completions of the requests. If this is due to a sustained rise of the stream’s output data rate, you should increase the number of shards within your stream to provide enough capacity for the read data transactions to consistently succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the change of the stream’s output data rate and the occurrence of ProvisionedThroughputExceeded exceptions.


Q: How do I change the throughput of my Amazon Kinesis stream?

You can change the throughput of an Amazon Kinesis stream by adjusting the number of shards within the stream (resharding). There are two types of resharding operations: shard split and shard merge. In a shard split, a single shard is divided into two shards, which increases the throughput of the stream. In a shard merge, two shards are merged into a single shard, which decreases the throughput of the stream. For more information about resharding, see Resharding a Stream.

Q: How often can I and how long does it take to change the throughput of my Amazon Kinesis stream?

A resharding operation such as shard split or shard merge takes a few seconds. You can only perform one resharding operation at a time. Therefore, for an Amazon Kinesis stream with only one shard, it takes a few seconds to double the throughput by splitting one shard. For a stream with 1000 shards, it takes 30K seconds (8.3 hours) to double the throughput by splitting 1000 shards. We recommend increasing the throughput of your Amazon Kinesis stream ahead of the time when extra throughput is needed.

Q: Does Amazon Kinesis remain available when I change the throughput of my Amazon Kinesis stream via resharding?

Yes. You can continue putting data into and reading data from your Amazon Kinesis stream while resharding is performing to change the throughput of the stream.

Q: How do I monitor the operations and performance of my Amazon Kinesis stream?

Amazon Kinesis Management Console displays key operational and performance metrics such as throughput of data input and output of your Amazon Kinesis streams. Amazon Kinesis also integrates with Amazon CloudWatch so that you can collect, view, and analyze CloudWatch metrics for your Amazon Kinesis streams. For more information about Amazon Kinesis metrics, see Monitoring Amazon Kinesis with Amazon CloudWatch.

Q: How do I manage and control access to my Amazon Kinesis stream?

Amazon Kinesis integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to put data into your Amazon Kinesis stream. For more information about access management and control of your Amazon Kinesis stream, see Controlling Access to Amazon Kinesis Resources using IAM.

Q: How do I log API calls made to my Amazon Kinesis stream for security analysis and operational troubleshooting?

Amazon Kinesis integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis API, see Logging Amazon Kinesis API calls Using Amazon CloudTrail.

Q: How do I effectively manage my Amazon Kinesis streams and the costs associated with these streams?

Amazon Kinesis allows you to tag your Amazon Kinesis streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your Amazon Kinesis streams by cost centers so that you can categorize and track your Amazon Kinesis costs based on cost centers. For more information about Amazon Kinesis tagging, see Tagging Your Amazon Kinesis Streams.


Q: Is Amazon Kinesis available in AWS Free Tier?

No. Amazon Kinesis is not currently available in AWS Free Tier. AWS Free Tier is a program that offers free trial for a group of AWS services. For more details about AWS Free Tier, see AWS Free Tier.

Q: How much does Amazon Kinesis cost?

Amazon Kinesis uses simple pay as you go pricing. There is neither upfront cost nor minimum fees and you only pay for the resources you use. The costs of Amazon Kinesis has two dimensions:

  • Hourly shard cost determined by the number of shards within your Amazon Kinesis stream.
  • Put data transaction cost determined by the number of put data transactions performed by your data producers.

For more information about Amazon Kinesis costs, see Amazon Kinesis Pricing.

Q: Other than Amazon Kinesis costs, are there any other costs that might incur to my Amazon Kinesis usage?

If you use Amazon EC2 for running your Amazon Kinesis Applications, you will be charged for Amazon EC2 resources in addition to Amazon Kinesis costs.

Amazon Kinesis Client Library (KCL) uses Amazon DynamoDB table to track state information of data record processing. If you use Amazon Kinesis Client Library (KCL) for you Amazon Kinesis Applications, you will be charged for Amazon DynamoDB resources in addition to Amazon Kinesis costs.

Please note that the above are two common but not exhaustive cases.