Streaming Data Solution for Amazon MSK

A massively scalable and durable real-time data streaming service that allows you to continuously capture data for processing

Overview

The Streaming Data Solution for Amazon MSK provides AWS CloudFormation templates where data flows through producers, streaming storage, consumers, and destinations. To support multiple use cases and business needs, this solution offers four AWS CloudFormation templates. Similar to the Streaming Data Solution for Amazon Kinesis, the templates are configured to apply best practices to monitor functionality using dashboards and alarms, and to secure data.

Streaming data must be durably captured by massively scalable storage that is capable of handling high data volume from data producers. A producer can be thousands of data sources, each generating streaming data continuously and which, typically, submit records simultaneously and in small sizes (kilobytes).

Streaming data includes a wide variety of data such as log files generated by customers using mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services and telemetry from connected devices or instrumentation in data centers.

Use cases for this AWS Solution
Data Warehousing Real-Time Streaming Data

Benefits

Automated configuration
Automatically configure the AWS services necessary to easily capture, store, process, and deliver streaming data.
Four template options
Choose from four different AWS CloudFormation template options. Test new service combinations for your production environment and improve existing applications.
Real-time use cases
Capture high-volume application logs, analyze clickstream data, continuously deliver to a data lake, and more.
Customizable source code
Customize the Solution's boilerplate code, and then use the monitoring capabilities to quickly transition from testing to production.

Technical details

The diagrams below present the four AWS CloudFormation templates that you can automatically deploy using the solution's implementation guide.
  • Option 1
  • AWS CloudFormation template using Amazon Managed Streaming for Apache Kafka (Amazon MSK)
    Option 1 – AWS CloudFormation template using Amazon MSK
    AWS Architecture Blog
    Amazon MSK Backup for Archival, Replay, or Analytics

    This post covers patterns and solutions that can be used to backup MSK topics to S3, which enables customers to reduce long-term data retention settings in MSK. Some customers store long term-data in MSK for data analytics and machine learning workloads. We share a pattern to simplify this architecture by offloading topics data in S3 and use S3 for analytics/ML.

    Read the blog 
    Training
    Data Analytics Fundamentals

    In this self-paced course, you learn about the process for planning data analysis solutions and the various data analytic processes that are involved.

    Enroll now 
    Training
    Amazon MSK Labs

    This site hosts information and hands-on Labs pertaining to Amazon MSK. These labs can either be run on personal or corporate AWS accounts or accounts provisioned by AWS Account teams for events that use Event Engine.

    Enroll now 
    About this deployment
    Version
    1.8.0
    Released
    09/2023
    Author
    AWS
    Est. deployment time
    25-30 mins
    Estimated cost
    Download implementation guide  Source code  CloudFormation template  Subscribe to RSS feed 
    Deployment options
    Ready to get started?
    Deploy this solution by launching it in your AWS Console
    Did this AWS Solution help you?
    Provide feedback
  • Option 2
  • AWS CloudFormation template using Amazon MSK and AWS Lambda
    Option 2 – AWS CloudFormation template using Amazon MSK and AWS Lambda

    Step 1
    This AWS CloudFormation template deploys a Lambda function that processes records in an Apache Kafka topic. The default function is a Node.js application that logs the received messages, but it can be customized to meet your business needs.

     

    AWS Architecture Blog
    Amazon MSK Backup for Archival, Replay, or Analytics

    This post covers patterns and solutions that can be used to backup MSK topics to S3, which enables customers to reduce long-term data retention settings in MSK. Some customers store long term-data in MSK for data analytics and machine learning workloads. We share a pattern to simplify this architecture by offloading topics data in S3 and use S3 for analytics/ML.

    Read the blog 
    Training
    Data Analytics Fundamentals

    In this self-paced course, you learn about the process for planning data analysis solutions and the various data analytic processes that are involved.

    Enroll now 
    Training
    Amazon MSK Labs

    This site hosts information and hands-on Labs pertaining to Amazon MSK. These labs can either be run on personal or corporate AWS accounts or accounts provisioned by AWS Account teams for events that use Event Engine.

    Enroll now 
    About this deployment
    Version
    1.8.0
    Released
    09/2023
    Author
    AWS
    Est. deployment time
    25-30 mins
    Estimated cost
    Download implementation guide  Source code  CloudFormation template  Subscribe to RSS feed 
    Deployment options
    Ready to get started?
    Deploy this solution by launching it in your AWS Console
    Did this AWS Solution help you?
    Provide feedback
  • Option 3
  • AWS CloudFormation template using Amazon MSK, AWS Lambda, and Amazon Kinesis Data Firehose
    Option 3 – AWS CloudFormation template using Amazon MSK, AWS Lambda, and Amazon Kinesis Data Firehose
    AWS Architecture Blog
    Amazon MSK Backup for Archival, Replay, or Analytics

    This post covers patterns and solutions that can be used to backup MSK topics to S3, which enables customers to reduce long-term data retention settings in MSK. Some customers store long term-data in MSK for data analytics and machine learning workloads. We share a pattern to simplify this architecture by offloading topics data in S3 and use S3 for analytics/ML.

    Read the blog 
    Training
    Data Analytics Fundamentals

    In this self-paced course, you learn about the process for planning data analysis solutions and the various data analytic processes that are involved.

    Enroll now 
    Training
    Amazon MSK Labs

    This site hosts information and hands-on Labs pertaining to Amazon MSK. These labs can either be run on personal or corporate AWS accounts or accounts provisioned by AWS Account teams for events that use Event Engine.

    Enroll now 
    About this deployment
    Version
    1.8.0
    Released
    09/2023
    Author
    AWS
    Est. deployment time
    25-30 mins
    Estimated cost
    Download implementation guide  Source code  CloudFormation template  Subscribe to RSS feed 
    Deployment options
    Ready to get started?
    Deploy this solution by launching it in your AWS Console
    Did this AWS Solution help you?
    Provide feedback
  • Option 4
  • AWS CloudFormation template using Amazon MSK, Amazon Managed Service for Apache Flink, and Amazon S3
    Option 4 – AWS CloudFormation template using Amazon MSK, Amazon Managed Service for Apache Flink, and Amazon S3
    AWS Architecture Blog
    Amazon MSK Backup for Archival, Replay, or Analytics

    This post covers patterns and solutions that can be used to backup MSK topics to S3, which enables customers to reduce long-term data retention settings in MSK. Some customers store long term-data in MSK for data analytics and machine learning workloads. We share a pattern to simplify this architecture by offloading topics data in S3 and use S3 for analytics/ML.

    Read the blog 
    Training
    Data Analytics Fundamentals

    In this self-paced course, you learn about the process for planning data analysis solutions and the various data analytic processes that are involved.

    Enroll now 
    Training
    Amazon MSK Labs

    This site hosts information and hands-on Labs pertaining to Amazon MSK. These labs can either be run on personal or corporate AWS accounts or accounts provisioned by AWS Account teams for events that use Event Engine.

    Enroll now 
    About this deployment
    Version
    1.8.0
    Released
    09/2023
    Author
    AWS
    Est. deployment time
    25-30 mins
    Estimated cost
    Download implementation guide  Source code  CloudFormation template  Subscribe to RSS feed 
    Deployment options
    Ready to get started?
    Deploy this solution by launching it in your AWS Console
    Did this AWS Solution help you?
    Provide feedback