Getting started with Amazon Kinesis Data Streams
Get started
Amazon Kinesis Data Streams is a massively scalable, highly durable data ingestion and processing service optimized for streaming data. You can configure hundreds of thousands of data producers to continuously put data into a Kinesis data stream. Data will be available within milliseconds to your Amazon Kinesis applications, and those applications will receive data records in the order they were generated.
Amazon Kinesis Data Streams is integrated with a number of AWS services, including Amazon Kinesis Data Firehose for near real-time transformation and delivery of streaming data into an AWS data lake like Amazon S3, Amazon Managed Service for Apache Flink for managed stream processing, AWS Lambda for event or record processing, AWS PrivateLink for private connectivity, Amazon Cloudwatch for metrics and log processing, and AWS KMS for server-side encryption.
Amazon Kinesis Data Streams is used as the gateway of a big data solution. Data from various sources is put into an Amazon Kinesis stream and then the data from the stream is consumed by different Amazon Kinesis applications. In this example, one application (in yellow) is running a real-time dashboard against the streaming data. Another application (in red) performs simple aggregation and emits processed data into Amazon S3. The data in S3 is further processed and stored in Amazon Redshift for complex analytics. The third application (in green) emits raw data into Amazon S3, which is then archived to Amazon Glacier for lower cost long-term storage. Notice all three of these data processing pipelines are happening simultaneously and in parallel.
Get started with Amazon Kinesis Data Streams
Videos
Use Kinesis Data Streams
After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Streams by:
- Creating an Amazon Kinesis data stream through either Amazon Kinesis Management Console or Amazon Kinesis CreateStream API.
- Configuring your data producers to continuously put data into your Amazon Kinesis data stream.
- Building your Amazon Kinesis applications to read and process data from your Amazon Kinesis data stream.
Key concepts
Open allData producer
Data consumer
Data stream
Shard
A shard is the base throughput unit of an Amazon Kinesis data stream.
- A shard is an append-only log and a unit of streaming capability. A shard contains an ordered sequence of records ordered by arrival time.
- One shard can ingest up to 1000 data records per second, or 1MB/sec. Add more shards to increase your ingestion capability.
- Add or remove shards from your stream dynamically as your data throughput changes using the AWS console, UpdateShardCount API , trigger automatic scaling via AWS Lambda , or using an auto scaling utility .
- When consumers use enhanced fan-out, one shard provides 1MB/sec data input and 2MB/sec data output for each data consumer registered to use enhanced fan-out.
- When consumers do not use enhanced fan-out, a shard provides 1MB/sec of input and 2MB/sec of data output, and this output is shared with any consumer not using enhanced fan-out.
- You will specify the number of shards needed when you create a stream and can change the quantity at any time. For example, you can create a stream with two shards. If you have 5 data consumers using enhanced fan-out, this stream can provide up to 20 MB/sec of total data output (2 shards x 2MB/sec x 5 data consumers). When data consumer are not using enhanced fan-out this stream has a throughput of 2MB/sec data input and 4MB/sec data output. In all cases this stream allows up to 2000 PUT records per second, or 2MB/sec of ingress whichever limit is met first.
- You can monitor shard-level metrics in Amazon Kinesis Data Streams.
Data record
Partition key
Sequence number
Put data into streams
Open allOverview
Amazon Kinesis Data Generator
Amazon Kinesis Data Streams API
Amazon Kinesis Producer Library (KPL)
Amazon Kinesis Agent
Run applications or build your own
Open allOverview
Amazon Kinesis Data Firehose
Amazon Managed Service for Apache Flink
AWS Lambda
Amazon Kinesis Client Library (KCL)
Amazon Kinesis Connector Library
Amazon Kinesis Storm Spout
Manage streams
Open allAccessing Kinesis Data Streams APIs privately from Amazon VPC
Fan-out Kinesis Data Streams data without sacrificing performance
Encrypting your Kinesis Data Streams data
Amazon Kinesis Data Firehose and Amazon Managed Service for Apache Flink integration
Amazon CloudWatch integration
AWS IAM integration
AWS CloudTrail integration
Tagging support
Tutorials
Open allAnalyze stock data with Amazon Kinesis Data Streams
This tutorial walks through the steps of creating an Amazon Kinesis data stream, sending simulated stock trading data in to the stream, and writing an application to process the data from the data stream.
Featured presentations
Open allAnalyzing streaming data in real time with Amazon Kinesis (ABD301)
Amazon Kinesis makes it easy to collect process and analyze real-time streaming data so you can get timely insights and react quickly to new information. In this session we present an end-to-end streaming data solution using Kinesis Streams for data ingestion Kinesis Analytics for real-time processing and Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Kinesis Analytics applications. Lastly we discuss how to estimate the cost of the entire system.
Workshop: Building your first big data application on AWS (ABD317)
Want to ramp up your knowledge of AWS big data web services and launch your first big data application on the cloud? We walk you through simplifying big data processing as a data bus comprising ingest, store, process, and visualize. You build a big data application using AWS managed services, including Amazon Athena, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. Along the way, we review architecture design patterns for big data applications and give you access to a take-home lab so that you can rebuild and customize the application yourself. You should bring your own laptop and have some familiarity with AWS services to get the most from this session.
Workshop: Don’t wait until tomorrow; How to use streaming data to gain real-time insights into your business (ABD321)
In recent years, there has been an explosive growth in the number of connected devices and real-time data sources. Because of this, data is being produced continuously and its production rate is accelerating. Businesses can no longer wait for hours or days to use this data. To gain the most valuable insights, they must use this data immediately so they can react quickly to new information. In this workshop, you learn how to take advantage of streaming data sources to analyze and react in near real-time. You are presented with several requirements for a real-world streaming data scenario and you're tasked with creating a solution that successfully satisfies the requirements using services such as Amazon Kinesis, AWS Lambda and Amazon SNS.
How Amazon Flex uses real-time analytics to deliver packages on time (ABD217)
Reducing the time to get actionable insights from data is important to all businesses and customers who employ batch data analytics tools are exploring the benefits of streaming analytics. Learn best practices to extend your architecture from data warehouses and databases to real-time solutions. Learn how to use Amazon Kinesis to get real-time data insights and integrate them with Amazon Aurora Amazon RDS Amazon Redshift and Amazon S3. The Amazon Flex team describes how they used streaming analytics in their Amazon Flex mobile app used by Amazon delivery drivers to deliver millions of packages each month on time. They discuss the architecture that enabled the move from a batch processing system to a real-time system overcoming the challenges of migrating existing batch data to streaming data and how to benefit from real-time analytics.
Real-time streaming applications on AWS: Use cases and patterns (ABD203)
To win in the marketplace and provide differentiated customer experiences, businesses need to be able to use live data in real time to facilitate fast decision making. In this session, you learn common streaming data processing use cases and architectures. First, we give an overview of streaming data and AWS streaming data capabilities. Next, we look at a few customer examples and their real-time streaming applications. Finally, we walk through common architectures and design patterns of top streaming data use cases.