Many Amazon Web Services (AWS) customers use streaming data to gain real-time insight into customer activity and immediate business trends. Streaming data, which is generated continuously from thousands of data sources, includes a wide variety of data such as log files from your mobile or web applications, e-commerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices. This data can help companies make well-informed decisions and proactively respond to changing business conditions.

Amazon Kinesis, a platform for streaming data on AWS, offers powerful services that make it easier to build data processing applications, load massive volumes of streaming data, and analyze it in real time.

This webpage provides best practices and guidance to consider when streaming data on the AWS Cloud, and an AWS-provided solution that automatically provisions and configures the AWS services necessary to start consuming and analyzing streaming data in minutes.

The following section assumes basic knowledge of architecting on the AWS Cloud, streaming data, and data analysis.


When analyzing data in the cloud, consider an approach that combines stream processing and batch processing. Process your data through a streaming data platform to extract real-time insights, then persist the data into a data store where it can be transformed and loaded for batch processing. With this in mind, consider these best practices when building a streaming data solution:

  • Whenever possible, choose a single data format for your streaming data. A single format will simplify your streaming data process by eliminating the need for different systems to transform the data into the appropriate format.
  • Consider using a flexible schema to start. Once your data is flowing through your streaming data platform, refine the schema.
  • Plan for scalability, data durability, and fault tolerance for data consumption, processing, and storage.
  • Implement granular access-control policies and use encryption to protect your streaming data.

AWS offers a solution that automatically launches and configures Amazon Kinesis Streams to load streaming data, Amazon Kinesis Analytics to filter and process that data, and Amazon Kinesis Firehose to deliver the data to various data stores for search, storage, or further analytics. The diagram below presents the Streaming Analytics Pipeline architecture you can deploy in minutes using the solution's implementation guide and accompanying AWS CloudFormation template.

  1. This solution uses Amazon Kinesis Streams to load streaming data into Amazon Kinesis Analytics.
  2. An Amazon Kinesis Analytics application filters and processes the data, and puts it into Amazon Kinesis Firehose. 
  3. An Amazon Kinesis Firehose delivery system delivers the analyzed data to various data stores for search, storage, or further analytics. You can choose to put your analyzed data into an Amazon S3 bucket, an Amazon Redshift cluster, an Amazon Elasticsearch Service domain, or an Amazon Kinesis stream.
  4. If you choose to persist raw data, AWS Lambda decodes the data and puts it into Amazon Kinesis Firehose which delivers it to Amazon S3.  
Deploy Solution
Implementation Guide

What you'll accomplish:

Deploy Streaming Analytics Pipeline using AWS CloudFormation. The CloudFormation template will automatically launch and configure the components necessary to consume and analyze streaming data.

Automatically analyze streaming data in an Amazon Kinesis Analytics application. You can customize the Amazon Kinesis Analytics application that is included with the solution.

What you'll need before starting:

An AWS account: You will need an AWS account to begin provisioning resources. Sign up for AWS.

A pre-configured external destination:  If you choose Amazon Redshift or Amazon Elasticsearch Service as the destination for your analyzed data, you must configure them to work with the Streaming Analytics Pipeline solution.

Skill level: This solution is intended for IT infrastructure professionals who have practical experience with streaming data and architecting on the AWS Cloud.

Q: Can I use an existing Amazon Kinesis stream as my source stream?

Yes. During initial deployment, you can specify an existing Amazon Kinesis stream as your source stream.  

Q: Can I store my analyzed data?

Yes. You can choose an Amazon S3 bucket (default), an Amazon Redshift cluster, an Amazon Elasticsearch Service domain, or an existing Amazon Kinesis stream to store the analyzed data from your Amazon Kinesis source stream.

If you choose Amazon Redshift or Amazon Elasticsearch Service as the destination for your analyzed data, you must configure them to work with the Streaming Analytics Pipeline before you deploy the solution.

Q: Can I store my raw data?  

Yes. The Streaming Analytics Pipeline gives you the option to decode, encrypt, and store raw streaming data in an Amazon S3 bucket.    

Q: Can I deploy the Streaming Analytics Pipeline in any AWS Region?

Customers can deploy the Streaming Analytics Pipeline CloudFormation template only in AWS Regions where AWS Lambda and Amazon Kinesis Analytics are available. For more information, please see AWS service offerings by region.

Need more resources to get started with AWS? Visit the Getting Started Resource Center to find tutorials, projects and videos to get started with AWS.

Tell us what you think