The Big Data Lambda Architecture seeks to provide data engineers and architects with a scalable, fault-tolerant data processing architecture and framework using loosely coupled, distributed systems. At a high level, the Lambda Architecture is designed to handle both real-time and historically aggregated batched data in an integrated fashion. In this post, I show you how you can use AWS services like AWS Glue to build a Lambda Architecture completely without servers. I use a practical demonstration to examine the tight integration between serverless services on AWS and create a robust data processing Lambda Architecture system.

Read the entire post »

To effectively replicate data from DynamoDB to Aurora, a reliable, scalable data replication (ETL) process needs to be built. In this post, I show you how to build such a process using a serverless architecture with AWS Lambda and Amazon Kinesis Data Firehose.

Read the entire post »

Last year, I published an AWS Security Blog post that showed how to optimize and visualize your security groups. Today’s post continues in the vein of that post by using Amazon Kinesis Data Firehose and AWS Lambda to enrich the VPC Flow Logs dataset and enhance your ability to optimize security groups. The capabilities in this post’s solution are based on the Lambda functions available in this VPC Flow Log Appender GitHub repository.

Read the entire post »

We have many customers who own and operate Elasticsearch, Logstash, and Kibana (ELK) stacks to load and visualize Apache web logs, among other log types. Amazon Elasticsearch Service provides Elasticsearch and Kibana in the AWS Cloud in a way that’s easy to set up and operate. Amazon Kinesis Firehose provides reliable, serverless delivery of Apache web logs (or other log data) to Amazon Elasticsearch Service. With Firehose, you can add an automatic call to an AWS Lambda function to transform records within Firehose. With these two technologies, you have an effective, easy-to-manage replacement for your existing ELK stack.

Read the entire post »

This blog post shows how to build a serverless architecture by using Amazon Kinesis Data Firehose, AWS Lambda, Amazon S3, Amazon Athena, and Amazon QuickSight to collect, store, query, and visualize flow logs.

Read the entire post »

Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES). In this post, I introduce data transformation capabilities on your delivery streams, to seamlessly transform incoming source data and deliver the transformed data to your destinations.

Read the entire post »

In this post, I show how you can build a business intelligence capability for streaming IoT device data using AWS serverless and managed services. You can be up and running in minutes―starting small, but able to easily grow to millions of devices and billions of messages.

Read the entire post »

Log aggregation is critical to your operational infrastructure. A reliable, secure, and scalable log aggregation solution makes all the difference during a crunch-time debugging session. In this post, we explore an alternative to the popular log aggregation solution, the ELK stack (Elasticsearch, Logstash, and Kibana): the EKK stack (Amazon Elasticsearch Service, Amazon Kinesis, and Kibana).

Read the entire post »

Streaming data technologies shorten the time to analyze and use your data from hours and days to minutes and seconds. Let’s walk through an example of using Amazon Kinesis Data Firehose, Amazon Redshift, and Amazon QuickSight to set up a streaming data pipeline and visualize Maryland traffic violation data in real time.

Read the entire post »

In this guest post, Anton Slutsky of MeetMe will discuss a solution using Amazon Kinesis Data Firehose to optimize and streamline large-scale data ingestion at MeetMe, which is a popular social discovery platform that caters to more than a million active daily users. The Data Science team at MeetMe needed to collect and store approximately 0.5 TB per day of various types of data in a way that would expose it to data mining tasks, business-facing reporting and advanced analytics. The team selected Amazon S3 as the target storage facility and faced a challenge of collecting the large volumes of live data in a robust, reliable, scalable and operationally affordable way.

Read the entire post »

Amazon Kinesis Agent is a stand-alone Java software application that provides an easy and reliable way to send data to Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose. The agent monitors a set of files for new data and then sends it to Kinesis Data Streams or Kinesis Data Firehose continuously. It handles file rotation, checkpointing, and retrial upon failures. It also supports Amazon CloudWatch so that you can closely monitor and troubleshoot the data flow from the agent.

Read the entire post »

Elasticsearch is a popular open-source search and analytics engine. Amazon Elasticsearch Service is a managed service that makes it easy for you to deploy, run, and scale Elasticsearch in the AWS Cloud. You can now arrange to deliver your Kinesis Data Firehose data stream to an Amazon Elasticsearch Cluster. This will allow you to index and analyze server logs, clickstreams, and social media traffic.

Read the entire post »

In this post we use Twitter public streams to analyze the candidates’ performance, both Republican and Democrat, in a near real-time fashion. We show you how to integrate Amazon Kinesis Data Firehose, AWS Lambda (Python function), and Amazon Elasticsearch Service to create an end-to-end, near real-time discovery platform.

Read the entire post »

This blog post walks you through a simple and effective way to persist data to Amazon S3 from Amazon Kinesis Data Streams using AWS Lambda and Amazon Kinesis Data Firehose.

Read the entire post here »