Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs. You can configure hundreds of thousands of data producers to continuously put data into a Kinesis data stream. For example, data from website clickstreams, application logs, and social media feeds. Within less than a second, the data will be available for your Amazon Kinesis Applications to read and process from the stream.

In the following architectural diagram, Amazon Kinesis Data Streams is used as the gateway of a big data solution. Data from various sources is put into an Amazon Kinesis stream and then the data from the stream is consumed by different Amazon Kinesis Applications. In this example, one application (in yellow) is running a real-time dashboard against the streaming data. Another application (in red) performs simple aggregation and emits processed data into Amazon S3. The data in Amazon S3 is further processed and stored in Amazon Redshift for complex analytics. The third application (in green) emits raw data into Amazon S3, which is then archived to Amazon Glacier for cheaper long-term storage. Notice all three of these data processing pipelines are happening simultaneously and in parallel. Amazon Kinesis Data Streams allows as many consumers of the data stream as your solution requires without performance penalty.

Get Started with AWS for Free

Create a Free Account
Or Sign In to the Console
kinesis-architecture-crop

A shard is the base throughput unit of an Amazon Kinesis stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second. You can monitor shard-level metrics in Amazon Kinesis Data Streams and add or remove shards from your stream dynamically as your data throughput changes by resharding the stream.

A record is the unit of data stored in an Amazon Kinesis stream. A record is composed of a sequence number, partition key, and data blob. A data blob is the data of interest your data producer adds to a stream. The maximum size of a data blob (the data payload after Base64-decoding) is 1 megabyte (MB).

Partition key is used to segregate and route data records to different shards of a stream. A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.

A sequence number is a unique identifier for each data record. Sequence number is assigned by Amazon Kinesis Data Streams when a data producer calls PutRecord or PutRecords API to add data to an Amazon Kinesis data stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become. 


After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Streams by:

  • Creating an Amazon Kinesis stream through either Amazon Kinesis Management Console or Amazon Kinesis CreateStream API.
  • Configuring your data producers to continuously put data into your Amazon Kinesis stream.
  • Building your Amazon Kinesis Applications to read and process data from your Amazon Kinesis stream.

Data producers can put data into Amazon Kinesis data streams using the Amazon Kinesis Data Streams APIs, Amazon Kinesis Producer Library (KPL), or Amazon Kinesis Agent

 

Amazon Kinesis Data Streams provides two APIs for putting data into an Amazon Kinesis stream: PutRecord and PutRecords. PutRecord allows a single data record within an API call and PutRecords allows multiple data records within an API call.

Amazon Kinesis Producer Library (KPL) is an easy to use and highly configurable library that helps you put data into an Amazon Kinesis stream. Amazon Kinesis Producer Library (KPL) presents a simple, asynchronous, and reliable interface that enables you to quickly achieve high producer throughput with minimal client resources.

Amazon Kinesis Agent is a pre-built Java application that offers an easy way to collect and send data to your Amazon Kinesis stream. You can install the agent on Linux-based server environments such as web servers, log servers, and database servers. The agent monitors certain files and continuously sends data to your stream. 


An Amazon Kinesis Application is a data consumer that reads and processes data from an Amazon Kinesis stream. You can build your Amazon Kinesis Applications using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL).

Amazon Kinesis Client Library (KCL) is a pre-built library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. KCL handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. KCL enables you to focus on business logic while building Amazon Kinesis Applications.

Amazon Kinesis Connector Library is a pre-built library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools. Amazon Kinesis Client Library (KCL) is required for using Amazon Kinesis Connector Library. The current version of this library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.

Amazon Kinesis Storm Spout is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with Apache Storm. The current version of Amazon Kinesis Storm Spout fetches data from Amazon Kinesis stream and emits it as tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis Data Streams as a reliable, scalable, stream capture, storage, and replay service.


Amazon Kinesis Data Streams integrates with Amazon CloudWatch so that you can collect, view, and analyze CloudWatch metrics for your Amazon Kinesis data streams and the shards within those data streams. For more information about Amazon Kinesis Data Streams metrics, see Monitoring Amazon Kinesis with Amazon CloudWatch.

Amazon Kinesis Data Streams integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to put data into your Amazon Kinesis stream. For more information about access management and control of your Amazon Kinesis stream, see Controlling Access to Amazon Kinesis Resources using IAM.

Amazon Kinesis Data Streams integrates with AWS CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis API, see Logging Amazon Kinesis API calls Using AWS CloudTrail.

You can privately access Kinesis Data Streams APIs from your Amazon Virtual Private Cloud (VPC) by creating VPC Endpoints. With VPC Endpoints, the routing between the VPC and Kinesis Data Streams is handled by the AWS network without the need for an Internet gateway, NAT gateway, or VPN connection. The latest generation of VPC Endpoints used by Kinesis Data Streams are powered by AWS PrivateLink, a technology that enables private connectivity between AWS services using Elastic Network Interfaces (ENI) with private IPs in your VPCs. For more information about, see the AWS PrivateLink documentation.

You can encrypt the data you put into Kinesis Data Streams using Server-side encryption or client-side encryption. Server-side encryption is a fully managed feature that automatically encrypts and decrypts data as you put and get it from a stream. Alternatively, you can encrypt your data on the client-side before putting it into your stream. To learn more, see the Security section of the Kinesis Data Streams FAQs.

Amazon Kinesis allows you to tag your Amazon Kinesis data streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your Amazon Kinesis data streams by cost centers so that you can categorize and track your Amazon Kinesis costs based on cost centers. For more information about, see Tagging Your Amazon Kinesis Data Streams.