AWS Big Data Blog
Category: Amazon Kinesis
Optimize downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
This blog post shows how to use Amazon Kinesis Data Firehose to merge many small messages into larger messages for delivery to Amazon S3, which results in faster processing with Amazon EMR running Spark. This post also shows how to read the compressed files using Apache Spark that are in Amazon S3, which does not have a proper file name extension and store back in Amazon S3 in parquet format.
Read MoreAmazon Kinesis Data Firehose custom prefixes for Amazon S3 objects
In February 2019, Amazon Web Services (AWS) announced a new feature in Amazon Kinesis Data Firehose called Custom Prefixes for Amazon S3 Objects. It lets customers specify a custom expression for the Amazon S3 prefix where data records are delivered. Previously, Kinesis Data Firehose allowed only specifying a literal prefix. This prefix was then combined with a static date-formatted prefix to create the […]
Read MoreBuild and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications
In this post, we discuss how you can use Apache Flink and Amazon Kinesis Data Analytics for Java Applications to address these challenges. We explore how to build a reliable, scalable, and highly available streaming architecture based on managed services that substantially reduce the operational overhead compared to a self-managed environment.
Read MoreImprove clinical trial outcomes by using AWS technologies
We are living in a golden age of innovation, where personalized medicine is making it possible to cure diseases that we never thought curable. Digital medicine is helping people with diseases get healthier, and we are constantly discovering how to use the body’s immune system to target and eradicate cancer cells. According to a report […]
Read MoreCreate real-time clickstream sessions and run analytics with Amazon Kinesis Data Analytics, AWS Glue, and Amazon Athena
Clickstream events are small pieces of data that are generated continuously with high speed and volume. Often, clickstream events are generated by user actions, and it is useful to analyze them. For example, you can detect user behavior in a website or application by analyzing the sequence of clicks a user makes, the amount of […]
Read MoreOur data lake story: How Woot.com built a serverless data lake on AWS
In this post, we talk about designing a cloud-native data warehouse as a replacement for our legacy data warehouse built on a relational database. At the beginning of the design process, the simplest solution appeared to be a straightforward lift-and-shift migration from one relational database to another. However, we decided to step back and focus […]
Read MoreManage centralized Microsoft Exchange Server logs using Amazon Kinesis Agent for Windows
This blog post discusses an efficient architecture to stream, analyze, and store Microsoft Exchange Server logs. For frequent queries and operational analytics, we use Amazon Elasticsearch Service (Amazon ES) and Kibana for real-time visualization.
Read MoreScale Amazon Kinesis Data Streams with AWS Application Auto Scaling
Recently, AWS launched a new feature of AWS Application Auto Scaling that let you define scaling policies that automatically add and remove shards to an Amazon Kinesis Data Stream. For more detailed information about this feature, see the Application Auto Scaling GitHub repository. As your streaming information increases, you require a scaling solution to accommodate […]
Read MoreYour guide to Amazon Kinesis sessions, chalk talks, and workshops at AWS re:Invent 2018
AWS re:Invent 2018 is almost here! This post includes a list of Amazon Kinesis sessions, chalk talks, and workshops at AWS re:Invent 2018. You can choose the link next to each session description for the session schedule. Use the information to help schedule your conference week in Las Vegas to learn more about Amazon Kinesis. Sessions ANT208 – […]
Read MoreTurn Windows DHCP Server logs into actionable metrics using Amazon Kinesis Agent for Windows
Understanding Windows system and service health on a global scale is challenging. You capture server log data, and then analyze and manipulate the data in real time to create actionable telemetry insights. Amazon Kinesis Agent for Microsoft Windows makes it efficient to ingest Windows server log data into your AWS ecosystem for analysis. This blog […]
Read More