Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. You can configure hundreds of thousands of data producers to continuously put data into an Amazon Kinesis stream. For example, data from website clickstreams, application logs, and social media feeds. Within less than a second, the data will be available for your Amazon Kinesis Applications to read and process from the stream.
In the following architectural diagram, Amazon Kinesis is used as the gateway of a big data solution. Data from various sources is put into an Amazon Kinesis stream and then the data from the stream is consumed by different Amazon Kinesis Applications. In this example, one application (in yellow) is running a real-time dashboard against the streaming data. Another application (in red) performs simple aggregation and emits processed data into Amazon S3. The data in Amazon S3 is further processed and stored in Amazon Redshift for complex analytics. The third application (in green) emits raw data into Amazon S3, which is then archived to Amazon Glacier for cheaper long-term storage. Notice all three of these data processing pipelines are happening simultaneously and in parallel. Amazon Kinesis allows as many consumers of the data stream as your solution requires without performance penalty.
A shard is the base throughput unit of an Amazon Kinesis stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second. You can dynamically add or remove shards from your stream as your data throughput changes via resharding.
A record is the unit of data stored in an Amazon Kinesis stream. A record is composed of a sequence number, partition key, and data blob. A data blob is the data of interest your data producer adds to a stream. The maximum size of a data blob (the data payload after Base64-decoding) is 50 kilobytes (KB).
Partition key is used to segregate and route data records to different shards of a stream. A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
A sequence number is a unique identifier for each data record. Sequence number is assigned by Amazon Kinesis when a data producer calls PutRecord or PutRecords API to add data to an Amazon Kinesis stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become.
After you sign up for Amazon Web Services, you can start using Amazon Kinesis by:
- Creating an Amazon Kinesis stream through either Amazon Kinesis Management Console or Amazon Kinesis CreateStream API.
- Configuring your data producers to continuously put data into your Amazon Kinesis stream via PutRecord or PutRecords API.
- Building your Amazon Kinesis Applications to read and process data from your Amazon Kinesis stream.
Amazon Kinesis Client Library (KCL) is a pre-built library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. Amazon Kinesis Client Library (KCL) handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. Amazon Kinesis Client Library (KCL) enables you to focus on business logic while building Amazon Kinesis Applications.
Amazon Kinesis Connector Library is a pre-built library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools. Amazon Kinesis Client Library (KCL) is required for using Amazon Kinesis Connector Library. The current version of this library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.
Amazon Kinesis Storm Spout is a pre-built library that helps you easily integrate Amazon Kinesis with Apache Storm. The current version of Amazon Kinesis Storm Spout fetches data from Amazon Kinesis stream and emits it as tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis as a reliable, scalable, stream capture, storage, and replay service.
Amazon Kinesis integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to put data into your Amazon Kinesis stream. For more information about access management and control of your Amazon Kinesis stream, see Controlling Access to Amazon Kinesis Resources using IAM.
Amazon Kinesis integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis API, see Logging Amazon Kinesis API calls Using Amazon CloudTrail.
Amazon Kinesis allows you to tag your Amazon Kinesis streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your Amazon Kinesis streams by cost centers so that you can categorize and track your Amazon Kinesis costs based on cost centers. For more information about Amazon Kinesis tagging, see Tagging Your Amazon Kinesis Streams.