What is real-time data streaming?
Real-time data streaming involves collecting and ingesting a sequence of data from various data sources and processing that data in real time to extract meaning and insight.
Examples of streaming data are log files generated by customers using your mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices or instrumentation in data centers.
Real-time data streaming enables you to analyze and process data in real time instead of waiting hours, days, or weeks to get answers.
What are the components of real-time data streaming?
Source: Up to hundreds and thousands of devices or applications that are producing high volumes of continuous data at a high velocity. Examples are mobile devices, web applications (clickstream), application logs, IoT sensors, smart devices and gaming applications.
Stream Ingestion: Simple integration with over 15 AWS services (Amazon API Gateway, AWS IoT Core, Amazon Cloudwatch, and more) that enables you to capture continuous data being produced from thousands of devices in a durable and secure manner.
Stream Storage: Choose a solution that meets your storage needs based on scaling, latency, and processing requirements like Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, and Amazon Managed Streaming for Apache Kafka (Amazon MSK).
Stream Processing: Choose from a selection of services ranging from solutions that require just a couple of clicks to transform and deliver data continuously to a destination like Amazon Kinesis Data Firehose, to powerful, custom-built, real-time applications and machine learning integration using services like Amazon Managed Service for Apache Flink and AWS Lambda.
Destination: Deliver streaming data to a selection of fully integrated data lakes, data warehouses, and analytics services for further analysis or long term storage, like Amazon S3, Amazon Redshift, Amazon OpenSearch Service, and Amazon EMR.
What are real-time data streaming use cases?
Real-Time Data Movement
Streaming data from hundreds of thousands of devices and performing ETL transformations on high volumes of continuous, high velocity data in real-time allows users to analyze data as soon as it is produced, and then durably store the data in a data lake, data warehouse, or database for further analysis.
Analyze data as soon as it is produced and enable real-time decisions across an organization to capitalize on opportunities, enhance customer experiences, prevent networking failures, or update critical business metrics in real-time.
Logs: Capture, process and analyze logs from your applications in real-time.
Real-time updates: Engage with consumers, gamers, financial traders, and more by poviding real-time updates to critical decisoning metrics, offer reccomendations, and customer experiences.
Clickstream: Get a real-time view of the performance of your web content and user interaction with your applications and websites including user behavior, amount of time spent, popular content, and more.
IoT: Connect to hundreds of thousands of IoT devices and collect, process, and analyze the streaming data in real-time.
Event Stream Processing
Capture and repsond to events as they happen in real-time across multiple applications. The most common use cases are communication between hundreds of decoupled microservices and maintaining a system of record via Change Data Capture.
Communication between decoupled microservices: When any micro-service is triggered, an event can be sent to a data stream in real-time, and other micro-services can ‘watch’ the stream to see if any event has occurred to trigger the required action.
Change Data Capture: All changes to data across several applications and databases can be streamed to a central system of record in real-time.
What streaming services are on AWS?
AWS provides several options to work with real-time data streaming.
- Amazon Kinesis Data Streams is a scalable and durable real-time data streaming service that can continuously capture gigabytes of data per second from hundreds of thousands of sources.
- Amazon Kinesis Data Firehose captures, transforms, and loads data streams into AWS data stores for near real-time analytics with existing business intelligence tools with just a few clicks.
- Amazon Managed Service for Apache Flink transforms and analyzes streaming data in real time with Apache Flink, an open-source framework and engine for processing data streams.
- Amazon Managed Streaming for Apache Kafka is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data.
Get started with real-time data streaming on AWS by creating an account today.
Next Steps on AWS
Instant get access to the AWS Free Tier.
Get started building in the AWS management console.